Skip to content

Kubeflow:Install

kubeflow 설치 방법 정리.

필수 조건

v1.10.2

  • 이 브랜치는 Kubernetes 버전 1.32를 대상으로 하는 마스터 브랜치입니다. (kubectl version 명령으로 확인 가능)
  • 릴리스별 특정 Kubernetes 버전에 대한 자세한 내용은 릴리스 노트를 참조하십시오.
  • 로컬 Kind(아래에 설치됨) 또는 기본 StorageClass 가 있는 자체 Kubernetes 클러스터를 사용하십시오 .
  • Kustomize 버전 5.4.3 이상. (kubectl version 명령으로 확인 가능)
  • Kubernetes 클러스터와 호환되는 Kubectl 버전(버전 불일치 정책)을 확인하세요.
  • 16GB RAM을 권장합니다.
  • 8개 CPU 코어권장합니다. <- 개뿔 권장같은 소리하네... 중간에 #Insufficient cpu 에러 뜨면 난감해진다. 그냥 무조건 8 코어 이상 써라.
  • kind버전 0.27 이상.
  • docker, podman또는 Kind 클러스터용 OCI 이미지를 실행하는 것과 같은 보다 최신 도구를 사용할 수도 있습니다.
  • example/kustomization.yaml Kubeflow 를 4~8GB 메모리와 2~4개의 CPU 코어에 맞추기 위해 구성 요소를 제외할 수 있습니다.
  • 다수의 파드를 지원하기 위한 Linux 커널 하위 시스템 변경 사항:
    • sudo sysctl fs.inotify.max_user_instances=2280
    • sudo sysctl fs.inotify.max_user_watches=1255360

Kubeflow Components Versions

v1.11.0

Component

Local Manifests Path

Upstream Revision

CPU (millicores)

Memory (Mi)

PVC Storage (GB)

내가 k3s 클러스터에 추가

Training Operator

applications/training-operator/upstream

v1.9.2

3m

25Mi

0GB

X

Trainer

applications/trainer/upstream

v2.1.0

8m

143Mi

0GB

X

Notebook Controller

applications/jupyter/notebook-controller/upstream

v1.10.0

5m

93Mi

0GB

O

PVC Viewer Controller

applications/pvcviewer-controller/upstream

v1.10.0

15m

128Mi

0GB

O

Tensorboard Controller

applications/tensorboard/tensorboard-controller/upstream

v1.10.0

15m

128Mi

0GB

O

Central Dashboard

applications/centraldashboard/upstream

v1.10.0

2m

159Mi

0GB

O

Profiles + KFAM

applications/profiles/upstream

v1.10.0

7m

129Mi

0GB

O

PodDefaults Webhook

applications/admission-webhook/upstream

v1.10.0

1m

14Mi

0GB

O

Jupyter Web Application

applications/jupyter/jupyter-web-app/upstream

v1.10.0

4m

231Mi

0GB

O

Tensorboards Web Application

applications/tensorboard/tensorboards-web-app/upstream

v1.10.0

O

Volumes Web Application

applications/volumes-web-app/upstream

v1.10.0

4m

226Mi

0GB

O

Katib

applications/katib/upstream

v0.19.0

13m

476Mi

10GB

O

KServe

applications/kserve/kserve

v0.15.2

600m

1200Mi

0GB

X

KServe Models Web Application

applications/kserve/models-web-app

v0.15.0

6m

259Mi

0GB

X

Kubeflow Pipelines

applications/pipeline/upstream

2.15.0

970m

3552Mi

35GB

X

Kubeflow Model Registry

applications/model-registry/upstream

v0.3.4

510m

2112Mi

20GB

X

Spark Operator

applications/spark/spark-operator

2.4.0

9m

41Mi

0GB

X

Istio

common/istio

1.28.0

750m

2364Mi

0GB

O

Knative

common/knative/knative-serving
common/knative/knative-eventing

v1.16.2
v1.16.4

1450m

1038Mi

0GB

X

Cert Manager

common/cert-manager

1.16.1

3m

128Mi

0GB

O

Dex

common/dex

2.43.1

3m

27Mi

0GB

O

OAuth2-Proxy

common/oauth2-proxy

7.10.0

3m

27Mi

0GB

O

Total

4380m

12341Mi

65GB

Install

manifests 저장소를 다운받자:

git clone https://github.com/kubeflow/manifests.git kubeflow-manifests
cd kubeflow-manifests

## git checkout v1.10.2  ## MinIO 이미지 Pull 이 안되는 이슈가 있다. (ImagePullBackOff)
git checkout v1.11.0

두 가지 방법이 있다:

  1. #Example 설치 디렉토리에 들어가서 한번에 설치.
  2. #Step-by-Step 설치 항목부터 차근차근 설치.

Uninstall

네임스페이스 기반으로 제거하자:

# Kubeflow 관련 네임스페이스 확인
kubectl get namespaces | grep -E 'kubeflow|istio|knative|cert-manager'

# 특정 네임스페이스 삭제 (설치 환경에 맞게 조정)
kubectl delete namespace kubeflow
kubectl delete namespace kubeflow-user-example-com
kubectl delete namespace istio-system
kubectl delete namespace knative-serving
kubectl delete namespace knative-eventing
kubectl delete namespace cert-manager
kubectl delete namespace auth

남아있는 CRD를 확인하고 제거하자:

# Kubeflow 관련 CRD 확인
kubectl get crd | grep -E 'kubeflow|istio|knative|cert-manager|seldon'

# 목록 확인 후 문제가 없다면 주의해서 CRD 삭제하자:
kubectl get crd | grep -E 'kubeflow|istio|knative|cert-manager|seldon' | awk '{print $1}' | xargs kubectl delete crd

남아있는 리소스 확인:

kubectl api-resources --verbs=list --namespaced -o name  | xargs -n 1 kubectl get --show-kind --ignore-not-found -A  | grep kubeflow

만약 다음과 같이 출력된다면:

default       11m         Normal    Stopping    decoratorcontroller/kubeflow-pipelines-profile-controller        Stopping controller: kubeflow-pipelines-profile-controller
default       11m         Normal    Stopped     decoratorcontroller/kubeflow-pipelines-profile-controller        Stopped controller: kubeflow-pipelines-profile-controller
default       11m         Normal    Stopping    decoratorcontroller/kubeflow-pipelines-profile-controller        Stopping controller: kubeflow-pipelines-profile-controller
default       11m         Normal    Stopped     decoratorcontroller/kubeflow-pipelines-profile-controller        Stopped controller: kubeflow-pipelines-profile-controller

이건 Kubernetes Event 리소스 이다. 일시적인 로그 성격의 리소스로, 시간이 지나면 자동으로 삭제된다.

바로 정리하고 싶다면 다음과 같이 삭제할 수 있습니다:

sudo kubectl get events --all-namespaces --field-selector involvedObject.name=kubeflow-pipelines-profile-controller -o name | xargs sudo kubectl delete

실제 동작 중인 리소스 확인

Event가 아닌 실제 리소스들이 남아있는지 더 정확하게 확인하려면:

# 실제 Kubeflow 파드 확인
kubectl get pods --all-namespaces | grep kubeflow

# Kubeflow 서비스 확인
kubectl get svc --all-namespaces | grep kubeflow

# Kubeflow 디플로이먼트 확인
kubectl get deployments --all-namespaces | grep kubeflow

# Kubeflow 스테이트풀셋 확인
kubectl get statefulsets --all-namespaces | grep kubeflow

# Kubeflow CRD 확인
kubectl get crd | grep kubeflow

Example 설치

INFORMATION

kubectl apply명령어가 처음 시도에서 실패할 수 있습니다. 이는 Kubernetes의 kubectl작동 방식(예: CR은 CRD가 준비된 후에 생성되어야 함)에 내재된 특성입니다. 해결 방법은 명령어가 성공할 때까지 다시 실행하는 것입니다. 한 줄짜리 명령어의 경우, 명령어를 다시 시도하는 bash 한 줄 명령어를 제공합니다.

다음 명령어를 사용하여 모든 Kubeflow 공식 구성 요소 (applications하위 디렉터리에 있음)와 모든 공통 서비스(common하위 디렉터리에 있음)를 설치할 수 있습니다.

kustomization.yaml 내용을 확인해 보고 일부 주석 부분을 수정한 후 설치한다:

while ! kustomize build example | kubectl apply --server-side --force-conflicts -f -; do echo "Retrying to apply resources"; sleep 20; done

참고로 v1.10.2 에 있는 kustomization.yaml 파일 내용은 다음과 같다:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

sortOptions:
  order: legacy
  legacySortOptions:
    orderFirst:
    - Namespace
    - ResourceQuota
    - StorageClass
    - CustomResourceDefinition
    - MutatingWebhookConfiguration
    - ServiceAccount
    - PodSecurityPolicy
    - NetworkPolicy
    - Role
    - ClusterRole
    - RoleBinding
    - ClusterRoleBinding
    - ConfigMap
    - Secret
    - Endpoints
    - Service
    - LimitRange
    - PriorityClass
    - PersistentVolume
    - PersistentVolumeClaim
    - Deployment
    - StatefulSet
    - CronJob
    - PodDisruptionBudget
    orderLast:
    - ValidatingWebhookConfiguration

resources:
# Cert-Manager
- ../common/cert-manager/base
- ../common/cert-manager/kubeflow-issuer/base
# Istio
- ../common/istio/istio-crds/base
- ../common/istio/istio-namespace/base
- ../common/istio/istio-install/overlays/oauth2-proxy
# NOTE: For Google Kubernetes Engine (GKE), use:
# - ../common/istio/istio-install/overlays/gke
#GKE mounts `/opt/cni/bin` as read-only for security reasons, preventing the Istio CNI installer from writing the CNI binary.
#Use the GKE-specific overlay: `kubectl apply -k common/istio/istio-install/overlays/gke`.
#This overlay uses GKE's writable CNI directory at `/home/kubernetes/bin`.
#For more details, see [Istio CNI Prerequisites](https://istio.io/latest/docs/setup/additional-setup/cni/#prerequisites) and [Platform Prerequisites](https://istio.io/latest/docs/ambient/install/platform-prerequisites/)
# oauth2-proxy
# NOTE: only uncomment ONE of the following overlays, depending on your cluster type
- ../common/oauth2-proxy/overlays/m2m-dex-only     # for all clusters
#- ../common/oauth2-proxy/overlays/m2m-dex-and-kind # for KIND clusters (allows K8S JWTs for gateway auth)
#- ../common/oauth2-proxy/overlays/m2m-dex-and-eks  # for EKS clusters (NOTE: requires you to configure issuer, see overlay)
# Dex
- ../common/dex/overlays/oauth2-proxy
# KNative
- ../common/knative/knative-serving/overlays/gateways
# Uncomment the following line if `knative-eventing` is required
# - ../common/knative/knative-eventing/base
- ../common/istio/cluster-local-gateway/base
# Kubeflow namespace
- ../common/kubeflow-namespace/base
# NetworkPolicies
- ../common/networkpolicies/base
# Kubeflow Roles
- ../common/kubeflow-roles/base
# Kubeflow Istio Resources
- ../common/istio/kubeflow-istio-resources/base
# Kubeflow Pipelines
- ../applications/pipeline/upstream/env/cert-manager/platform-agnostic-multi-user
# Katib
- ../applications/katib/upstream/installs/katib-with-kubeflow
# Central Dashboard
- ../applications/centraldashboard/overlays/oauth2-proxy
# Admission Webhook
- ../applications/admission-webhook/upstream/overlays/cert-manager
# Jupyter Web App
- ../applications/jupyter/jupyter-web-app/upstream/overlays/istio
# Notebook Controller
- ../applications/jupyter/notebook-controller/upstream/overlays/kubeflow
# Profiles + KFAM with PSS (Pod Security Standards)
- ../applications/profiles/pss
# PVC Viewer
- ../applications/pvcviewer-controller/upstream/base
# Volumes Web App
- ../applications/volumes-web-app/upstream/overlays/istio
# Tensorboards Controller
- ../applications/tensorboard/tensorboard-controller/upstream/overlays/kubeflow
# Tensorboard Web App
- ../applications/tensorboard/tensorboards-web-app/upstream/overlays/istio
# Training Operator
- ../applications/training-operator/upstream/overlays/kubeflow
# User namespace
- ../common/user-namespace/base
# KServe
- ../applications/kserve/kserve
- ../applications/kserve/models-web-app/overlays/kubeflow
# Spark Operator
- ../applications/spark/spark-operator/overlays/kubeflow

# Ray is an experimental integration
# Here is the documentation for Ray: https://docs.ray.io/en/latest/
# Here is the internal documentation for Ray: - ../experimental/ray/README.md
# - ../experimental/ray/kuberay-operator/overlays/kubeflow

components:
# Pod Security Standards
# https://kubernetes.io/docs/concepts/security/pod-security-standards/
# Uncomment to enable baseline level standards
# - ../experimental/security/PSS/static/baseline
# Uncomment to enable restricted level standards
# - ../experimental/security/PSS/static/restricted
# Uncomment to enable baseline level standards for dynamic namespaces
# - ../experimental/security/PSS/dynamic/baseline
# Uncomment to enable restricted level standards for dynamic namespaces
# - ../experimental/security/PSS/dynamic/restricted

위의 Example 내용을 참고하여 필요한 내용만 Step-by-Step 으로 설치하는 방법도 좋다.

제거 방법

kubectl delete -k example

Step-by-Step 설치

manifests 저장소를 다운받고, 프로젝트 루트 디렉토리 기점으로 설치를 진행한다.

git clone https://github.com/kubeflow/manifests.git kubeflow-manifests
cd kubeflow-manifests
git checkout v1.11.0

INFORMATION

v1.10.2 -> v1.11.0 으로 전환되면서 설치 순서가 변경되었다. 아래 내용은 수정된 설치 순서를 반영한다.

Kubeflow namespace

Kubeflow 구성 요소가 위치할 네임스페이스를 생성합니다. 이 네임스페이스의 이름은 kubeflow 입니다.

kustomize build common/kubeflow-namespace/base | kubectl apply -f -

출력:

namespace/kubeflow created
namespace/kubeflow-system created  ## v1.11.0 에서 추가됨.

Cert-Manager

Cert-manager는 Kubeflow의 여러 구성 요소에서 승인 웹훅에 필요한 인증서를 제공하는 데 사용됩니다.

INFORMATION

만약 cert-manager를 설치했다면 base 는 건너뛰어도 된다. 다만 cert-manager 에 #pod-security.kubernetes.io/enforce=restricted 라벨을 수동으로 패치해야 한다.

kustomize build common/cert-manager/base | kubectl apply -f -

## 이거부터 해야 아래 kubeflow-issuer 가 정상 동작됨:
kubectl wait --for=condition=Ready pod -l 'app in (cert-manager,webhook)' --timeout=180s -n cert-manager

kustomize build common/cert-manager/kubeflow-issuer/base | kubectl apply -f -
kubectl wait --for=jsonpath='{.subsets[0].addresses[0].targetRef.kind}'=Pod endpoints -l 'app in (cert-manager,webhook)' --timeout=180s -n cert-manager

이후 Pods 가 Running 상태인지 확인하자:

kubectl get -n cert-manager pods

출력 결과:

NAME                                       READY   STATUS    RESTARTS   AGE
cert-manager-5f864bbfd-4cdwm               1/1     Running   0          5m43s
cert-manager-cainjector-589dc747b5-nxlds   1/1     Running   0          5m43s
cert-manager-webhook-5987c7ff58-7qmsd      1/1     Running   0          5m43s

만약 다음과 같은 에러가 출력된다면:

Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": dial tcp 10.96.202.64:443: connect: connection refused

웹훅이 아직 요청을 수신할 준비가 되지 않았기 때문입니다. 몇 초 기다렸다가 매니페스트 적용을 다시 시도해 주세요.

각각의 출력 결과:

## kustomize build common/cert-manager/base | kubectl apply -f -
namespace/cert-manager created
customresourcedefinition.apiextensions.k8s.io/certificaterequests.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/certificates.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/challenges.acme.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/clusterissuers.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/issuers.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/orders.acme.cert-manager.io created
serviceaccount/cert-manager created
serviceaccount/cert-manager-cainjector created
serviceaccount/cert-manager-webhook created
role.rbac.authorization.k8s.io/cert-manager-tokenrequest created
role.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving created
role.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created
role.rbac.authorization.k8s.io/cert-manager:leaderelection created
clusterrole.rbac.authorization.k8s.io/cert-manager-cainjector created
clusterrole.rbac.authorization.k8s.io/cert-manager-cluster-view created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-approve:cert-manager-io created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-certificates created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-certificatesigningrequests created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-challenges created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-issuers created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-orders created
clusterrole.rbac.authorization.k8s.io/cert-manager-edit created
clusterrole.rbac.authorization.k8s.io/cert-manager-view created
clusterrole.rbac.authorization.k8s.io/cert-manager-webhook:subjectaccessreviews created
rolebinding.rbac.authorization.k8s.io/cert-manager-cert-manager-tokenrequest created
rolebinding.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving created
rolebinding.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created
rolebinding.rbac.authorization.k8s.io/cert-manager:leaderelection created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-cainjector created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-approve:cert-manager-io created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-certificates created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-certificatesigningrequests created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-challenges created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-issuers created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-orders created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-webhook:subjectaccessreviews created
service/cert-manager created
service/cert-manager-cainjector created
service/cert-manager-webhook created
deployment.apps/cert-manager created
deployment.apps/cert-manager-cainjector created
deployment.apps/cert-manager-webhook created
mutatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created
validatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created

## kubectl wait --for=condition=Ready pod -l 'app in (cert-manager,webhook)' --timeout=180s -n cert-manager
pod/cert-manager-5f864bbfd-bh855 condition met
pod/cert-manager-webhook-5987c7ff58-ddhkw condition met

## kustomize build common/cert-manager/kubeflow-issuer/base | kubectl apply -f -
clusterissuer.cert-manager.io/kubeflow-self-signing-issuer created

## kubectl wait --for=jsonpath='{.subsets[0].addresses[0].targetRef.kind}'=Pod endpoints -l 'app in (cert-manager,webhook)' --timeout=180s -n cert-manager
Warning: v1 Endpoints is deprecated in v1.33+; use discovery.k8s.io/v1 EndpointSlice
endpoints/cert-manager condition met
endpoints/cert-manager-webhook condition met

==== pod-security.kubernetes.io/enforce=restricted ====

kubeflow-issuer

Kubeflow 환경에서 TLS/SSL 인증서를 자동으로 발급하고 관리하는 cert-manager의 Issuer 리소스입니다.

  • 인증서 발급 주체 정의
    • Let's Encrypt, Vault, 자체 서명(self-signed) 등 인증서 발급 기관(CA)을 지정
    • Kubeflow 컴포넌트들이 HTTPS 통신을 위해 필요한 인증서를 자동으로 요청하고 갱신
  • Kubeflow 컴포넌트 보안
    • Kubeflow Dashboard, Jupyter Notebook, Katib 등의 서비스에 TLS 적용
    • Istio Gateway와 연동하여 외부 접속 시 암호화된 통신 제공
kubectl apply -k common/cert-manager/kubeflow-issuer/base

출력:

clusterissuer.cert-manager.io/kubeflow-self-signing-issuer created

Istio

대부분의 Kubeflow 구성 요소는 Istio를 사용하여 트래픽을 보호하고, 네트워크 권한을 적용하고, 라우팅 정책을 구현합니다. 이 설치에서는 Istio CNI를 사용하므로 권한 있는 초기화 컨테이너가 필요 없어지고 Pod 보안 표준과의 호환성이 향상됩니다. 클러스터에서 Cilium CNI를 사용하는 경우, 여기에 나와 있는 대로 Istio에 맞게 올바르게 구성해야 합니다 . 그렇지 않으면 중앙 대시보드에서 RBAC 액세스 거부 오류가 발생합니다.

우선 Istio#Platform Requirements 항목을 참조하여 필요한 모듈을 로드하자.

echo "Installing Istio CNI configured with external authorization..."
kustomize build common/istio/istio-crds/base | kubectl apply -f -
kustomize build common/istio/istio-namespace/base | kubectl apply -f -

# For most platforms (Kind, Minikube, AKS, EKS, etc.)
kustomize build common/istio/istio-install/overlays/oauth2-proxy | kubectl apply -f -

# For Google Kubernetes Engine (GKE), use:
# kustomize build common/istio/istio-install/overlays/gke | kubectl apply -f -

echo "Waiting for all Istio Pods to become ready..."
kubectl wait --for=condition=Ready pods --all -n istio-system --timeout 300s

참고로 istio-ingressgateway Pod 가 ContainerCreating 에 시간을 좀 잡아먹더라... 이벤트 확인해 보니 gcr.io/istio-release/proxyv2:1.28.0 이미지 Pulling 에서 시간을 좀 먹는듯...? 여튼 약 6분 소요됨.

각각의 출력결과:

## kustomize build common/istio/istio-crds/base | kubectl apply -f -
customresourcedefinition.apiextensions.k8s.io/authorizationpolicies.security.istio.io created
customresourcedefinition.apiextensions.k8s.io/destinationrules.networking.istio.io created
customresourcedefinition.apiextensions.k8s.io/envoyfilters.networking.istio.io created
customresourcedefinition.apiextensions.k8s.io/gateways.networking.istio.io created
customresourcedefinition.apiextensions.k8s.io/peerauthentications.security.istio.io created
customresourcedefinition.apiextensions.k8s.io/proxyconfigs.networking.istio.io created
customresourcedefinition.apiextensions.k8s.io/requestauthentications.security.istio.io created
customresourcedefinition.apiextensions.k8s.io/serviceentries.networking.istio.io created
customresourcedefinition.apiextensions.k8s.io/sidecars.networking.istio.io created
customresourcedefinition.apiextensions.k8s.io/telemetries.telemetry.istio.io created
customresourcedefinition.apiextensions.k8s.io/virtualservices.networking.istio.io created
customresourcedefinition.apiextensions.k8s.io/wasmplugins.extensions.istio.io created
customresourcedefinition.apiextensions.k8s.io/workloadentries.networking.istio.io created
customresourcedefinition.apiextensions.k8s.io/workloadgroups.networking.istio.io created

## kustomize build common/istio/istio-namespace/base | kubectl apply -f -
namespace/istio-system created

## kustomize build common/istio/istio-install/overlays/oauth2-proxy | kubectl apply -f -
serviceaccount/istio-ingressgateway-service-account created
serviceaccount/istio-reader-service-account created
serviceaccount/istiod created
serviceaccount/istio-cni created
role.rbac.authorization.k8s.io/istio-ingressgateway-sds created
role.rbac.authorization.k8s.io/istiod created
clusterrole.rbac.authorization.k8s.io/istio-cni created
clusterrole.rbac.authorization.k8s.io/istio-cni-repair-role created
clusterrole.rbac.authorization.k8s.io/istio-reader-clusterrole-istio-system created
clusterrole.rbac.authorization.k8s.io/istiod-clusterrole-istio-system created
clusterrole.rbac.authorization.k8s.io/istiod-gateway-controller-istio-system created
rolebinding.rbac.authorization.k8s.io/istio-ingressgateway-sds created
rolebinding.rbac.authorization.k8s.io/istiod created
clusterrolebinding.rbac.authorization.k8s.io/istio-cni created
clusterrolebinding.rbac.authorization.k8s.io/istio-cni-repair-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/istio-reader-clusterrole-istio-system created
clusterrolebinding.rbac.authorization.k8s.io/istiod-clusterrole-istio-system created
clusterrolebinding.rbac.authorization.k8s.io/istiod-gateway-controller-istio-system created
configmap/istio created
configmap/istio-sidecar-injector created
configmap/values created
configmap/istio-cni-config created
service/istio-ingressgateway created
service/istiod created
deployment.apps/istio-ingressgateway created
deployment.apps/istiod created
Warning: spec.template.metadata.annotations[container.apparmor.security.beta.kubernetes.io/install-cni]: deprecated since v1.30; use the "appArmorProfile" field instead
daemonset.apps/istio-cni-node created
horizontalpodautoscaler.autoscaling/istio-ingressgateway created
horizontalpodautoscaler.autoscaling/istiod created
gateway.networking.istio.io/istio-ingressgateway created
authorizationpolicy.security.istio.io/global-deny-all created
authorizationpolicy.security.istio.io/istio-ingressgateway created
mutatingwebhookconfiguration.admissionregistration.k8s.io/istio-sidecar-injector created
validatingwebhookconfiguration.admissionregistration.k8s.io/istio-validator-istio-system created

## kubectl wait --for=condition=Ready pods --all -n istio-system --timeout 300s
pod/istio-ingressgateway-547db5c8dd-kmftd condition met
pod/istiod-5fbdbdb45-nsw68 condition met

Oauth2-proxy

oauth2-proxy는 Istio Ingress-Gateway 의 기능을 확장하여 OIDC 클라이언트로 작동하도록 합니다. 사용자 세션은 물론 토큰 기반의 머신 간 인증도 지원합니다.

  • 다음 오버레이는 상호 배타적이므로 옵션 중 하나만 선택하여 설치하면 된다.
  • 더 많은 옵션은 common/oauth2-proxy/overlays/를 참조

Options 1

대부분의 클러스터에서 작동하지만, Istio Ingress-Gateway 를 통해 클러스터 외부에서 K8s 서비스 계정 토큰을 사용하는 것은 허용하지 않습니다.

kustomize build common/oauth2-proxy/overlays/m2m-dex-only/ | kubectl apply -f -
kubectl wait --for=condition=Ready pod -l 'app.kubernetes.io/name=oauth2-proxy' --timeout=180s -n oauth2-proxy

각각의 출력 결과:

## kustomize build common/oauth2-proxy/overlays/m2m-dex-only/ | kubectl apply -f -
namespace/oauth2-proxy created
serviceaccount/oauth2-proxy created
configmap/oauth2-proxy-hk55gm96k4 created
configmap/oauth2-proxy-parameters-74659b6648 created
configmap/oauth2-proxy-theme-5t624ft8b8 created
secret/oauth2-proxy-h675gf55ht created
service/oauth2-proxy created
deployment.apps/oauth2-proxy created
virtualservice.networking.istio.io/oauth2-proxy created
authorizationpolicy.security.istio.io/istio-ingressgateway-oauth2-proxy created
authorizationpolicy.security.istio.io/istio-ingressgateway-require-jwt created
requestauthentication.security.istio.io/dex-jwt created

## kubectl wait --for=condition=Ready pod -l 'app.kubernetes.io/name=oauth2-proxy' --timeout=180s -n oauth2-proxy
pod/oauth2-proxy-649b9846d8-ddkh6 condition met
pod/oauth2-proxy-649b9846d8-vgpp7 condition met

Options 2

Kind, K3D, Rancher, GKE 및 기타 여러 클러스터에서 적절한 구성으로 작동하며, Istio 인그레스 게이트웨이를 통해 클러스터 외부에서 Kubernetes 서비스 계정 토큰을 사용할 수 있도록 합니다.

예를 들어 GitHub Actions를 사용한 자동화에 활용할 수 있습니다. 최종적으로는 /common/oauth2-proxy/overlays/m2m-dex-and-kind/kustomization.yaml 파일에서와 같이 istio-system 네임스페이스의 요청 인증 리소스에서 issuer 및 jwksUri 필드를 패치해야 합니다.

패치 방법은 아래의 #Upgrading and Extending 섹션을 참조하십시오.

클러스터 내 파드에서 curl --insecure -H "Authorization: Bearer &#96;cat /var/run/secrets/kubernetes.io/serviceaccount/token&#96;"https://kubernetes.default/.well-known/openid-configuration 명령을 실행하면 클러스터의 발급자를 확인할 수 있습니다.

kustomize build common/oauth2-proxy/overlays/m2m-dex-and-kind/ | kubectl apply -f -
kubectl wait --for=condition=Ready pod -l 'app.kubernetes.io/name=oauth2-proxy' --timeout=180s -n oauth2-proxy
kubectl wait --for=condition=Ready pod -l 'app.kubernetes.io/name=cluster-jwks-proxy' --timeout=180s -n istio-system

Options 3

이 기능은 Istio Ingress-Gateway 를 통해 클러스터 외부에서 사용할 수 있는 K8s 서비스 계정 토큰을 사용하는 대부분의 EKS 클러스터에서 작동합니다.

먼저 common/oauth2-proxy/overlays/m2m-dex-and-eks/ 파일에서 AWS_REGIONCLUSTER_ID를 조정해야 합니다.

kustomize build common/oauth2-proxy/overlays/m2m-dex-and-eks/ | kubectl apply -f -
kubectl wait --for=condition=Ready pod -l 'app.kubernetes.io/name=oauth2-proxy' --timeout=180s -n oauth2-proxy

설치 완료 후 토큰을 생성 및 사용 방법

Kubernetes 서비스 계정 토큰 지원을 포함한 설치를 완료하면 토큰을 생성하고 사용할 수 있습니다.

kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80
TOKEN="$(kubectl -n $KF_PROFILE_NAMESPACE create token default-editor)"
client = kfp.Client(host="http://localhost:8080/pipeline", existing_token=token)
curl -v "localhost:8080/jupyter/api/namespaces/${$KF_PROFILE_NAMESPACE}/notebooks" -H "Authorization: Bearer ${TOKEN}"

Dex 없이 OAuth2 Proxy 를 사용하고 자체 IDP에 직접 연결하려면 이 문서를 참조하십시오. 하지만 아래 #Dex 섹션에서 설명하는 것처럼 Dex를 유지하면서 자체 IDP에 연결하는 커넥터를 추가하여 확장할 수도 있습니다.

Dex

Dex는 여러 인증 백엔드를 지원하는 OpenID Connect (OIDC) ID 공급자입니다. 기본 설치에는 이메일 주소가 [email protected] 인 고정 사용자가 포함되어 있습니다. 기본 사용자 비밀번호는 12341234입니다. 프로덕션 Kubeflow 배포 환경에서는 관련 섹션을 참조하여 기본 비밀번호를 변경해야 합니다.

kustomize build common/dex/overlays/oauth2-proxy | kubectl apply -f -
kubectl wait --for=condition=Ready pods --all --timeout=180s -n auth

각각의 출력 결과:

## kustomize build common/dex/overlays/oauth2-proxy | kubectl apply -f -
namespace/auth created
customresourcedefinition.apiextensions.k8s.io/authcodes.dex.coreos.com created
serviceaccount/dex created
clusterrole.rbac.authorization.k8s.io/dex created
clusterrolebinding.rbac.authorization.k8s.io/dex created
configmap/dex created
secret/dex-oidc-client created
secret/dex-passwords created
service/dex created
deployment.apps/dex created
virtualservice.networking.istio.io/dex created

## kubectl wait --for=condition=Ready pods --all --timeout=180s -n auth
pod/dex-84ccb4b7c8-f22kr condition met

다른 ID 공급자 연결 방법

원하는 ID 공급자(LDAP, GitHub, Google, Microsoft, OIDC, SAML, GitLab)에 연결하려면 https://dexidp.io/docs/connectors/oidc/ 를 참조하세요. 대부분의 공급자와 호환되므로 일반적으로 OIDC 사용을 권장합니다.

예를 들어 다음 예시에서는 Azure를 사용합니다. https://github.com/kubeflow/manifests/blob/master/common/dex/overlays/oauth2-proxy/config-map.yaml 파일을 수정하고, 메인 Kustomization 파일에 패치 섹션을 추가하여 https://github.com/kubeflow/manifests/blob/master/common/dex/base/deployment.yaml 파일에 환경 변수를 추가 해야 합니다. 자세한 내용은 #Upgrading and Extending 문서를 참조하세요.

apiVersion: v1
kind: ConfigMap
metadata:
  name: dex
data:
  config.yaml: |
    issuer: https://$KUBEFLOW_INGRESS_URL/dex
    storage:
      type: kubernetes
      config:
        inCluster: true
    web:
      http: 0.0.0.0:5556
    logger:
      level: "debug"
      format: text
    oauth2:
      skipApprovalScreen: true
    enablePasswordDB: true
    #### WARNING: YOU SHOULD NOT USE THE DEFAULT STATIC PASSWORDS
    #### and patch /common/dex/base/dex-passwords.yaml in a Kustomize overlay or remove it
    staticPasswords:
    - email: [email protected]
      hashFromEnv: DEX_USER_PASSWORD
      username: user
      userID: "15841185641784"
    staticClients:
    # https://github.com/dexidp/dex/pull/1664
    - idEnv: OIDC_CLIENT_ID
      redirectURIs: ["/oauth2/callback"]
      name: 'Dex Login Application'
      secretEnv: OIDC_CLIENT_SECRET
    #### Here come the connectors to OIDC providers such as Azure, GCP, GitHub, GitLab, etc.
    #### Connector config values starting with a "$" will read from the environment.
    connectors:
    - type: oidc
      id: azure
      name: azure
      config:
        issuer: https://login.microsoftonline.com/$TENANT_ID/v2.0
        redirectURI: https://$KUBEFLOW_INGRESS_URL/dex/callback
        clientID: $AZURE_CLIENT_ID
        clientSecret: $AZURE_CLIENT_SECRET
        insecureSkipEmailVerified: true
        scopes:
        - openid
        - profile
        - email
        #- groups # groups might be used in the future

Keycloak의 경우, https://github.com/kubeflow/manifests/blob/master/common/dex/README.md 에서 대략적인 가이드라인을 확인할 수 있습니다.

KNative

Knative는 쿠버네티스 위에서 서버리스 애플리케이션을 빌드, 배포 및 관리하기 위한 오픈 소스 도구이다.

Knative는 Kubeflow 공식 컴포넌트인 KServe에서 사용됩니다.

INFORMATION

KServe를 사용하지 않을거면 설치하지 않아도 된다.

kustomize build common/knative/knative-serving/overlays/gateways | kubectl apply -f -
kustomize build common/istio/cluster-local-gateway/base | kubectl apply -f -

선택적으로 Knative Eventing을 설치하여 추론 요청 로깅에 사용할 수 있습니다.

kustomize build common/knative/knative-eventing/base | kubectl apply -f -

돌리다가 아래와 같은 에러 출력됨:

Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "webhook.serving.knative.dev": failed to call webhook: Post "https://webhook.knative-serving.svc:443/defaulting?timeout=10s": no endpoints available for service "webhook"

웹훅이 준비되지 않아서 그런거임 좀 기다렸다 다시 요청 동일한 설치 명령 날리자.

각각의 출력 결과:

## kustomize build common/knative/knative-serving/overlays/gateways | kubectl apply -f -
## 이건 `ensure CRDs are installed first` 에러 발생해서 몇 번 더 돌려봤음.
namespace/knative-serving unchanged
customresourcedefinition.apiextensions.k8s.io/certificates.networking.internal.knative.dev unchanged
customresourcedefinition.apiextensions.k8s.io/clusterdomainclaims.networking.internal.knative.dev unchanged
customresourcedefinition.apiextensions.k8s.io/configurations.serving.knative.dev unchanged
customresourcedefinition.apiextensions.k8s.io/domainmappings.serving.knative.dev unchanged
customresourcedefinition.apiextensions.k8s.io/images.caching.internal.knative.dev unchanged
customresourcedefinition.apiextensions.k8s.io/ingresses.networking.internal.knative.dev unchanged
customresourcedefinition.apiextensions.k8s.io/metrics.autoscaling.internal.knative.dev unchanged
customresourcedefinition.apiextensions.k8s.io/podautoscalers.autoscaling.internal.knative.dev unchanged
customresourcedefinition.apiextensions.k8s.io/revisions.serving.knative.dev unchanged
customresourcedefinition.apiextensions.k8s.io/routes.serving.knative.dev unchanged
customresourcedefinition.apiextensions.k8s.io/serverlessservices.networking.internal.knative.dev unchanged
customresourcedefinition.apiextensions.k8s.io/services.serving.knative.dev unchanged
serviceaccount/activator unchanged
serviceaccount/controller unchanged
role.rbac.authorization.k8s.io/knative-serving-activator unchanged
clusterrole.rbac.authorization.k8s.io/knative-serving-activator-cluster unchanged
clusterrole.rbac.authorization.k8s.io/knative-serving-addressable-resolver unchanged
clusterrole.rbac.authorization.k8s.io/knative-serving-admin unchanged
clusterrole.rbac.authorization.k8s.io/knative-serving-aggregated-addressable-resolver unchanged
clusterrole.rbac.authorization.k8s.io/knative-serving-core unchanged
clusterrole.rbac.authorization.k8s.io/knative-serving-istio unchanged
clusterrole.rbac.authorization.k8s.io/knative-serving-namespaced-admin unchanged
clusterrole.rbac.authorization.k8s.io/knative-serving-namespaced-edit unchanged
clusterrole.rbac.authorization.k8s.io/knative-serving-namespaced-view unchanged
clusterrole.rbac.authorization.k8s.io/knative-serving-podspecable-binding unchanged
rolebinding.rbac.authorization.k8s.io/knative-serving-activator unchanged
clusterrolebinding.rbac.authorization.k8s.io/knative-serving-activator-cluster unchanged
clusterrolebinding.rbac.authorization.k8s.io/knative-serving-controller-addressable-resolver unchanged
clusterrolebinding.rbac.authorization.k8s.io/knative-serving-controller-admin unchanged
configmap/config-autoscaler unchanged
configmap/config-certmanager unchanged
configmap/config-defaults unchanged
configmap/config-deployment unchanged
configmap/config-domain unchanged
configmap/config-features unchanged
configmap/config-gc unchanged
configmap/config-istio unchanged
configmap/config-leader-election unchanged
configmap/config-logging unchanged
configmap/config-network unchanged
configmap/config-observability unchanged
configmap/config-tracing unchanged
secret/net-istio-webhook-certs unchanged
secret/webhook-certs unchanged
service/knative-local-gateway unchanged
service/activator-service unchanged
service/autoscaler unchanged
service/controller unchanged
service/net-istio-webhook unchanged
service/webhook unchanged
deployment.apps/activator configured
deployment.apps/autoscaler configured
deployment.apps/controller configured
deployment.apps/net-istio-controller unchanged
deployment.apps/net-istio-webhook unchanged
deployment.apps/webhook unchanged
poddisruptionbudget.policy/activator-pdb configured
poddisruptionbudget.policy/webhook-pdb configured
horizontalpodautoscaler.autoscaling/activator unchanged
horizontalpodautoscaler.autoscaling/webhook unchanged
image.caching.internal.knative.dev/queue-proxy unchanged
certificate.networking.internal.knative.dev/routing-serving-certs unchanged
destinationrule.networking.istio.io/knative unchanged
gateway.networking.istio.io/knative-local-gateway unchanged
authorizationpolicy.security.istio.io/activator-service unchanged
authorizationpolicy.security.istio.io/autoscaler unchanged
authorizationpolicy.security.istio.io/controller unchanged
authorizationpolicy.security.istio.io/istio-webhook unchanged
authorizationpolicy.security.istio.io/webhook unchanged
peerauthentication.security.istio.io/net-istio-webhook unchanged
peerauthentication.security.istio.io/webhook unchanged
mutatingwebhookconfiguration.admissionregistration.k8s.io/webhook.istio.networking.internal.knative.dev unchanged
mutatingwebhookconfiguration.admissionregistration.k8s.io/webhook.serving.knative.dev configured
validatingwebhookconfiguration.admissionregistration.k8s.io/config.webhook.istio.networking.internal.knative.dev unchanged
validatingwebhookconfiguration.admissionregistration.k8s.io/config.webhook.serving.knative.dev unchanged
validatingwebhookconfiguration.admissionregistration.k8s.io/validation.webhook.serving.knative.dev configured

## kustomize build common/istio/cluster-local-gateway/base | kubectl apply -f -
serviceaccount/cluster-local-gateway-service-account created
role.rbac.authorization.k8s.io/cluster-local-gateway-sds created
rolebinding.rbac.authorization.k8s.io/cluster-local-gateway-sds created
service/cluster-local-gateway created
deployment.apps/cluster-local-gateway created
horizontalpodautoscaler.autoscaling/cluster-local-gateway created
gateway.networking.istio.io/cluster-local-gateway created
authorizationpolicy.security.istio.io/cluster-local-gateway created

## kustomize build common/knative/knative-eventing/base | kubectl apply -f -
namespace/knative-eventing created
customresourcedefinition.apiextensions.k8s.io/apiserversources.sources.knative.dev created
customresourcedefinition.apiextensions.k8s.io/brokers.eventing.knative.dev created
customresourcedefinition.apiextensions.k8s.io/channels.messaging.knative.dev created
customresourcedefinition.apiextensions.k8s.io/containersources.sources.knative.dev created
customresourcedefinition.apiextensions.k8s.io/eventpolicies.eventing.knative.dev created
customresourcedefinition.apiextensions.k8s.io/eventtypes.eventing.knative.dev created
customresourcedefinition.apiextensions.k8s.io/jobsinks.sinks.knative.dev created
customresourcedefinition.apiextensions.k8s.io/parallels.flows.knative.dev created
customresourcedefinition.apiextensions.k8s.io/pingsources.sources.knative.dev created
customresourcedefinition.apiextensions.k8s.io/sequences.flows.knative.dev created
customresourcedefinition.apiextensions.k8s.io/sinkbindings.sources.knative.dev created
customresourcedefinition.apiextensions.k8s.io/subscriptions.messaging.knative.dev created
customresourcedefinition.apiextensions.k8s.io/triggers.eventing.knative.dev created
serviceaccount/eventing-controller created
serviceaccount/eventing-webhook created
serviceaccount/job-sink created
serviceaccount/pingsource-mt-adapter created
role.rbac.authorization.k8s.io/knative-eventing-webhook created
clusterrole.rbac.authorization.k8s.io/addressable-resolver configured
clusterrole.rbac.authorization.k8s.io/broker-addressable-resolver unchanged
clusterrole.rbac.authorization.k8s.io/broker-subscriber unchanged
clusterrole.rbac.authorization.k8s.io/builtin-podspecable-binding unchanged
clusterrole.rbac.authorization.k8s.io/channel-addressable-resolver unchanged
clusterrole.rbac.authorization.k8s.io/channel-subscriber unchanged
clusterrole.rbac.authorization.k8s.io/channelable-manipulator configured
clusterrole.rbac.authorization.k8s.io/crossnamespace-subscriber configured
clusterrole.rbac.authorization.k8s.io/eventing-broker-filter unchanged
clusterrole.rbac.authorization.k8s.io/eventing-broker-ingress unchanged
clusterrole.rbac.authorization.k8s.io/eventing-config-reader unchanged
clusterrole.rbac.authorization.k8s.io/eventing-sources-source-observer unchanged
clusterrole.rbac.authorization.k8s.io/flows-addressable-resolver unchanged
clusterrole.rbac.authorization.k8s.io/jobsinks-addressable-resolver unchanged
clusterrole.rbac.authorization.k8s.io/knative-bindings-namespaced-admin unchanged
clusterrole.rbac.authorization.k8s.io/knative-eventing-controller unchanged
clusterrole.rbac.authorization.k8s.io/knative-eventing-job-sink unchanged
clusterrole.rbac.authorization.k8s.io/knative-eventing-namespaced-admin unchanged
clusterrole.rbac.authorization.k8s.io/knative-eventing-namespaced-edit unchanged
clusterrole.rbac.authorization.k8s.io/knative-eventing-namespaced-view unchanged
clusterrole.rbac.authorization.k8s.io/knative-eventing-pingsource-mt-adapter unchanged
clusterrole.rbac.authorization.k8s.io/knative-eventing-sources-controller unchanged
clusterrole.rbac.authorization.k8s.io/knative-eventing-webhook unchanged
clusterrole.rbac.authorization.k8s.io/knative-flows-namespaced-admin unchanged
clusterrole.rbac.authorization.k8s.io/knative-messaging-namespaced-admin unchanged
clusterrole.rbac.authorization.k8s.io/knative-sinks-namespaced-admin unchanged
clusterrole.rbac.authorization.k8s.io/knative-sources-namespaced-admin unchanged
clusterrole.rbac.authorization.k8s.io/meta-channelable-manipulator unchanged
clusterrole.rbac.authorization.k8s.io/podspecable-binding configured
clusterrole.rbac.authorization.k8s.io/service-addressable-resolver unchanged
clusterrole.rbac.authorization.k8s.io/serving-addressable-resolver unchanged
clusterrole.rbac.authorization.k8s.io/source-observer configured
rolebinding.rbac.authorization.k8s.io/eventing-webhook created
clusterrolebinding.rbac.authorization.k8s.io/eventing-controller unchanged
clusterrolebinding.rbac.authorization.k8s.io/eventing-controller-crossnamespace-subscriber unchanged
clusterrolebinding.rbac.authorization.k8s.io/eventing-controller-manipulator unchanged
clusterrolebinding.rbac.authorization.k8s.io/eventing-controller-resolver unchanged
clusterrolebinding.rbac.authorization.k8s.io/eventing-controller-source-observer unchanged
clusterrolebinding.rbac.authorization.k8s.io/eventing-controller-sources-controller unchanged
clusterrolebinding.rbac.authorization.k8s.io/eventing-webhook unchanged
clusterrolebinding.rbac.authorization.k8s.io/eventing-webhook-podspecable-binding unchanged
clusterrolebinding.rbac.authorization.k8s.io/eventing-webhook-resolver unchanged
clusterrolebinding.rbac.authorization.k8s.io/knative-eventing-job-sink unchanged
clusterrolebinding.rbac.authorization.k8s.io/knative-eventing-pingsource-mt-adapter unchanged
configmap/config-br-default-channel created
configmap/config-br-defaults created
configmap/config-features created
configmap/config-kreference-mapping created
configmap/config-leader-election created
configmap/config-logging created
configmap/config-observability created
configmap/config-ping-defaults created
configmap/config-sugar created
configmap/config-tracing created
configmap/default-ch-webhook created
secret/eventing-webhook-certs created
service/eventing-webhook created
service/job-sink created
deployment.apps/eventing-controller created
deployment.apps/eventing-webhook created
deployment.apps/job-sink created
deployment.apps/pingsource-mt-adapter created
poddisruptionbudget.policy/eventing-webhook created
horizontalpodautoscaler.autoscaling/eventing-webhook created
mutatingwebhookconfiguration.admissionregistration.k8s.io/sinkbindings.webhook.sources.knative.dev unchanged
mutatingwebhookconfiguration.admissionregistration.k8s.io/webhook.eventing.knative.dev created
validatingwebhookconfiguration.admissionregistration.k8s.io/config.webhook.eventing.knative.dev created
validatingwebhookconfiguration.admissionregistration.k8s.io/validation.webhook.eventing.knative.dev created

NetworkPolicies

네트워크 정책을 설치합니다.

kustomize build common/networkpolicies/base | kubectl apply -f -

출력:

networkpolicy.networking.k8s.io/cache-server created
networkpolicy.networking.k8s.io/centraldashboard created
networkpolicy.networking.k8s.io/default-allow-same-namespace created
networkpolicy.networking.k8s.io/jupyter-web-app created
networkpolicy.networking.k8s.io/katib-controller created
networkpolicy.networking.k8s.io/katib-db-manager created
networkpolicy.networking.k8s.io/katib-ui created
networkpolicy.networking.k8s.io/kserve created
networkpolicy.networking.k8s.io/kserve-models-web-app created
networkpolicy.networking.k8s.io/metadata-grpc-server created
networkpolicy.networking.k8s.io/metatada-envoy created
networkpolicy.networking.k8s.io/minio created
networkpolicy.networking.k8s.io/ml-pipeline created
networkpolicy.networking.k8s.io/ml-pipeline-ui created
networkpolicy.networking.k8s.io/model-registry created
networkpolicy.networking.k8s.io/model-registry-ui created
networkpolicy.networking.k8s.io/poddefaults created
networkpolicy.networking.k8s.io/pvcviewer-webhook created
networkpolicy.networking.k8s.io/spark-operator-webhook created
networkpolicy.networking.k8s.io/tensorboards-web-app created
networkpolicy.networking.k8s.io/training-operator-webhook created
networkpolicy.networking.k8s.io/volumes-web-app created

Kubeflow Roles

Kubeflow ClusterRoles를 생성합니다: kubeflow-view, kubeflow-edit, 및 kubeflow-admin. Kubeflow 구성 요소는 이러한 ClusterRoles에 권한을 집계합니다.

kustomize build common/kubeflow-roles/base | kubectl apply -f -

출력:

clusterrole.rbac.authorization.k8s.io/kubeflow-admin created
clusterrole.rbac.authorization.k8s.io/kubeflow-edit created
clusterrole.rbac.authorization.k8s.io/kubeflow-kubernetes-admin created
clusterrole.rbac.authorization.k8s.io/kubeflow-kubernetes-edit created
clusterrole.rbac.authorization.k8s.io/kubeflow-kubernetes-view created
clusterrole.rbac.authorization.k8s.io/kubeflow-view created

Kubeflow Istio Resources

Kubeflow 게이트웨이를 생성합니다: kubeflow-gateway 와 ClusterRole kubeflow-istio-admin

kustomize build common/istio/kubeflow-istio-resources/base | kubectl apply -f -

출력:

clusterrole.rbac.authorization.k8s.io/kubeflow-istio-admin created
clusterrole.rbac.authorization.k8s.io/kubeflow-istio-edit created
clusterrole.rbac.authorization.k8s.io/kubeflow-istio-view created
gateway.networking.istio.io/kubeflow-gateway created

Kubeflow Pipelines

INFORMATION

이 스탭 부터 컴퓨터 리소스 겁나 깨지니 각오하자

ML 워크플로우 오케스트레이션

  • Pipelines 필요한 경우:
    • ML 워크플로우를 자동화하고 재사용하려는 경우
    • 실험 추적 및 버전 관리가 필요한 경우
    • 복잡한 ML 파이프라인을 구축하는 경우
  • Pipelines 불필요한 경우:
    • 단순히 Jupyter Notebook만 사용하는 경우
    • 모델 서빙만 필요한 경우
    • 간단한 실험만 수행하는 경우

v1.10.2

WARNING

v1.10.2 는 MinIO 이슈로 인해 사용 금지

Kubeflow 공식 구성 요소인 Multi-User Kubeflow Pipelines를 설치.

이 명령은 runasnonroot emissary executor 를 사용하여 Argo를 설치합니다.

컨테이너를 루트 권한으로 실행할 때 발생하는 보안 문제를 분석하고 Kubeflow 파이프라인의 주요 컨테이너를 runasnonroot로 실행할지 여부를 결정하는 것은 여전히 ​​사용자 책임입니다.

일반적으로 모든 사용자 접근 가능 OCI 컨테이너는 "Pod 보안 표준을 제한" (Pod Security Standards restricted) 하여 실행하는 것이 강력히 권장됩니다.

kustomize build applications/pipeline/upstream/env/cert-manager/platform-agnostic-multi-user | kubectl apply -f -

"ensure CRDs are installed first" 에러가 발생될 수 있으니 여러번 실행할 준비해라...

v1.11.0

Kubeflow Pipelines는 서로 다른 사용 사례와 운영 선호도에 맞춰 설계된 두 가지 배포 옵션을 제공합니다.

  • 기존의 데이터베이스 기반 방식은 파이프라인 정의를 외부 데이터베이스에 저장하는 반면,
  • Kubernetes 네이티브 API 모드는 Kubernetes 사용자 정의 리소스를 활용하여 파이프라인 정의를 저장하고 관리합니다.

여기 설명된 대로 기본 아티팩트 저장소는 이제 SeaweedFS입니다. example 에 있는 kustomization을 사용한 단일 명령 설치는 SeaweedFS를 파이프라인의 기본 S3 호환 아티팩트 저장소로 설정합니다.

v1.10.2 에서 SeaweedFS를 사용하도록 패치하는 방법

이 설치는 minio-service S3 트래픽을 SeaweedFS로 라우팅하도록 기존 방식을 대체하고 Argo Workflow 컨트롤러가 SeaweedFS를 사용하도록 패치합니다. 단계별 설치를 진행하고 파이프라인 아티팩트 저장소로 SeaweedFS를 사용하려면 MinIO 기반 오버레이 대신 다음 오버레이를 적용하십시오.

INFORMATION

이 설치는 v1.10.2 에서 SeaweedFS 를 사용하도록 패치하는 것임. 따라서 v1.10.2 에는 experimental/seaweedfs/istio 가 있지만, v1.11.0 에는 없다. 즉 v1.11.0 에서는 그냥 기본값으로 지정되니까 신경쓰지 않아도 된다.

kustomize build experimental/seaweedfs/istio | kubectl apply -f -

MinIO로 다시 전환하려면 아래에 표시된 표준 업스트림 파이프라인 오버레이를 사용하십시오.

INFORMATION

MinIO는 다음 릴리스에서 제거될 예정입니다.

Pipeline Definitions Stored in the Database

(파이프라인 정의는 데이터베이스에 저장됩니다)

Kubeflow 공식 구성 요소인 Multi-User Kubeflow Pipelines를 설치하세요.

kustomize build applications/pipeline/upstream/env/cert-manager/platform-agnostic-multi-user | kubectl apply -f -

아래 에러 출력되니 여러번 도전 요청하자:

error: resource mapping not found for name: "kubeflow-pipelines-profile-controller" namespace: "kubeflow" from "STDIN": no matches for kind "DecoratorController" in version "metacontroller.k8s.io/v1alpha1"
ensure CRDs are installed first

첫 번째 출력:

# Warning: 'vars' is deprecated. Please use 'replacements' instead. [EXPERIMENTAL] Run 'kustomize edit fix' to update your Kustomization automatically.
2025/12/24 00:30:52 well-defined vars that were never replaced: kfp-app-name,kfp-app-version
customresourcedefinition.apiextensions.k8s.io/clusterworkflowtemplates.argoproj.io created
customresourcedefinition.apiextensions.k8s.io/compositecontrollers.metacontroller.k8s.io created
customresourcedefinition.apiextensions.k8s.io/controllerrevisions.metacontroller.k8s.io created
customresourcedefinition.apiextensions.k8s.io/cronworkflows.argoproj.io created
customresourcedefinition.apiextensions.k8s.io/decoratorcontrollers.metacontroller.k8s.io created
customresourcedefinition.apiextensions.k8s.io/scheduledworkflows.kubeflow.org created
customresourcedefinition.apiextensions.k8s.io/viewers.kubeflow.org created
customresourcedefinition.apiextensions.k8s.io/workflowartifactgctasks.argoproj.io created
customresourcedefinition.apiextensions.k8s.io/workfloweventbindings.argoproj.io created
customresourcedefinition.apiextensions.k8s.io/workflows.argoproj.io created
customresourcedefinition.apiextensions.k8s.io/workflowtaskresults.argoproj.io created
customresourcedefinition.apiextensions.k8s.io/workflowtasksets.argoproj.io created
customresourcedefinition.apiextensions.k8s.io/workflowtemplates.argoproj.io created
serviceaccount/argo created
serviceaccount/kubeflow-pipelines-cache created
serviceaccount/kubeflow-pipelines-container-builder created
serviceaccount/kubeflow-pipelines-metadata-writer created
serviceaccount/kubeflow-pipelines-viewer created
serviceaccount/meta-controller-service created
serviceaccount/metadata-grpc-server created
serviceaccount/ml-pipeline created
serviceaccount/ml-pipeline-persistenceagent created
serviceaccount/ml-pipeline-scheduledworkflow created
serviceaccount/ml-pipeline-ui created
serviceaccount/ml-pipeline-viewer-crd-service-account created
serviceaccount/ml-pipeline-visualizationserver created
serviceaccount/mysql created
serviceaccount/pipeline-runner created
serviceaccount/seaweedfs created
role.rbac.authorization.k8s.io/argo-role created
role.rbac.authorization.k8s.io/kubeflow-pipelines-cache-role created
role.rbac.authorization.k8s.io/kubeflow-pipelines-metadata-writer-role created
role.rbac.authorization.k8s.io/ml-pipeline created
role.rbac.authorization.k8s.io/ml-pipeline-persistenceagent-role created
role.rbac.authorization.k8s.io/ml-pipeline-scheduledworkflow-role created
role.rbac.authorization.k8s.io/ml-pipeline-ui created
role.rbac.authorization.k8s.io/ml-pipeline-viewer-controller-role created
role.rbac.authorization.k8s.io/pipeline-runner created
clusterrole.rbac.authorization.k8s.io/aggregate-to-kubeflow-pipelines-edit created
clusterrole.rbac.authorization.k8s.io/aggregate-to-kubeflow-pipelines-view created
clusterrole.rbac.authorization.k8s.io/argo-aggregate-to-admin created
clusterrole.rbac.authorization.k8s.io/argo-aggregate-to-edit created
clusterrole.rbac.authorization.k8s.io/argo-aggregate-to-view created
clusterrole.rbac.authorization.k8s.io/argo-cluster-role created
clusterrole.rbac.authorization.k8s.io/kubeflow-metacontroller created
clusterrole.rbac.authorization.k8s.io/kubeflow-pipelines-cache-role created
clusterrole.rbac.authorization.k8s.io/kubeflow-pipelines-edit created
clusterrole.rbac.authorization.k8s.io/kubeflow-pipelines-metadata-writer-role created
clusterrole.rbac.authorization.k8s.io/kubeflow-pipelines-view created
clusterrole.rbac.authorization.k8s.io/ml-pipeline created
clusterrole.rbac.authorization.k8s.io/ml-pipeline-persistenceagent-role created
clusterrole.rbac.authorization.k8s.io/ml-pipeline-scheduledworkflow-role created
clusterrole.rbac.authorization.k8s.io/ml-pipeline-ui created
clusterrole.rbac.authorization.k8s.io/ml-pipeline-viewer-controller-role created
rolebinding.rbac.authorization.k8s.io/argo-binding created
rolebinding.rbac.authorization.k8s.io/kubeflow-pipelines-cache-binding created
rolebinding.rbac.authorization.k8s.io/kubeflow-pipelines-metadata-writer-binding created
rolebinding.rbac.authorization.k8s.io/ml-pipeline created
rolebinding.rbac.authorization.k8s.io/ml-pipeline-persistenceagent-binding created
rolebinding.rbac.authorization.k8s.io/ml-pipeline-scheduledworkflow-binding created
rolebinding.rbac.authorization.k8s.io/ml-pipeline-ui created
rolebinding.rbac.authorization.k8s.io/ml-pipeline-viewer-crd-binding created
rolebinding.rbac.authorization.k8s.io/pipeline-runner-binding created
clusterrolebinding.rbac.authorization.k8s.io/argo-binding created
clusterrolebinding.rbac.authorization.k8s.io/kubeflow-pipelines-cache-binding created
clusterrolebinding.rbac.authorization.k8s.io/kubeflow-pipelines-metadata-writer-binding created
clusterrolebinding.rbac.authorization.k8s.io/meta-controller-cluster-role-binding created
clusterrolebinding.rbac.authorization.k8s.io/ml-pipeline created
clusterrolebinding.rbac.authorization.k8s.io/ml-pipeline-persistenceagent-binding created
clusterrolebinding.rbac.authorization.k8s.io/ml-pipeline-scheduledworkflow-binding created
clusterrolebinding.rbac.authorization.k8s.io/ml-pipeline-ui created
clusterrolebinding.rbac.authorization.k8s.io/ml-pipeline-viewer-crd-binding created
configmap/kfp-launcher created
configmap/kubeflow-pipelines-profile-controller-code-42d46f9h7d created
configmap/kubeflow-pipelines-profile-controller-env-5252m69c4c created
configmap/metadata-envoy-configmap created
configmap/metadata-grpc-configmap created
configmap/ml-pipeline-ui-configmap created
configmap/pipeline-api-server-config-dc9hkg52h6 created
configmap/pipeline-install-config created
configmap/workflow-controller-configmap created
secret/mlpipeline-minio-artifact created
secret/mysql-secret created
service/cache-server created
service/kubeflow-pipelines-profile-controller created
service/metadata-envoy-service created
service/metadata-grpc-service created
service/minio-service created
service/ml-pipeline created
service/ml-pipeline-ui created
service/ml-pipeline-visualizationserver created
service/mysql created
service/seaweedfs created
priorityclass.scheduling.k8s.io/workflow-controller created
persistentvolumeclaim/mysql-pv-claim created
persistentvolumeclaim/seaweedfs-pvc created
deployment.apps/cache-server created
deployment.apps/kubeflow-pipelines-profile-controller created
deployment.apps/metadata-envoy-deployment created
deployment.apps/metadata-grpc-deployment created
deployment.apps/metadata-writer created
deployment.apps/ml-pipeline created
deployment.apps/ml-pipeline-persistenceagent created
deployment.apps/ml-pipeline-scheduledworkflow created
deployment.apps/ml-pipeline-ui created
deployment.apps/ml-pipeline-viewer-crd created
deployment.apps/ml-pipeline-visualizationserver created
deployment.apps/mysql created
deployment.apps/seaweedfs created
deployment.apps/workflow-controller created
statefulset.apps/metacontroller created
certificate.cert-manager.io/kfp-cache-cert created
issuer.cert-manager.io/kfp-cache-selfsigned-issuer created
destinationrule.networking.istio.io/metadata-grpc-service created
destinationrule.networking.istio.io/ml-pipeline created
destinationrule.networking.istio.io/ml-pipeline-mysql created
destinationrule.networking.istio.io/ml-pipeline-seaweedfs created
destinationrule.networking.istio.io/ml-pipeline-ui created
destinationrule.networking.istio.io/ml-pipeline-visualizationserver created
virtualservice.networking.istio.io/metadata-grpc created
virtualservice.networking.istio.io/ml-pipeline-ui created
networkpolicy.networking.k8s.io/seaweedfs created
authorizationpolicy.security.istio.io/metadata-grpc-service created
authorizationpolicy.security.istio.io/ml-pipeline created
authorizationpolicy.security.istio.io/ml-pipeline-ui created
authorizationpolicy.security.istio.io/ml-pipeline-visualizationserver created
authorizationpolicy.security.istio.io/mysql created
authorizationpolicy.security.istio.io/seaweedfs-service created
authorizationpolicy.security.istio.io/service-cache-server created
mutatingwebhookconfiguration.admissionregistration.k8s.io/cache-webhook-kubeflow created
error: resource mapping not found for name: "kubeflow-pipelines-profile-controller" namespace: "kubeflow" from "STDIN": no matches for kind "DecoratorController" in version "metacontroller.k8s.io/v1alpha1"
ensure CRDs are installed first

두 번째 출력:

# Warning: 'vars' is deprecated. Please use 'replacements' instead. [EXPERIMENTAL] Run 'kustomize edit fix' to update your Kustomization automatically.
2025/12/24 00:36:07 well-defined vars that were never replaced: kfp-app-name,kfp-app-version
customresourcedefinition.apiextensions.k8s.io/clusterworkflowtemplates.argoproj.io unchanged
customresourcedefinition.apiextensions.k8s.io/compositecontrollers.metacontroller.k8s.io unchanged
customresourcedefinition.apiextensions.k8s.io/controllerrevisions.metacontroller.k8s.io unchanged
customresourcedefinition.apiextensions.k8s.io/cronworkflows.argoproj.io unchanged
customresourcedefinition.apiextensions.k8s.io/decoratorcontrollers.metacontroller.k8s.io unchanged
customresourcedefinition.apiextensions.k8s.io/scheduledworkflows.kubeflow.org unchanged
customresourcedefinition.apiextensions.k8s.io/viewers.kubeflow.org unchanged
customresourcedefinition.apiextensions.k8s.io/workflowartifactgctasks.argoproj.io unchanged
customresourcedefinition.apiextensions.k8s.io/workfloweventbindings.argoproj.io unchanged
customresourcedefinition.apiextensions.k8s.io/workflows.argoproj.io unchanged
customresourcedefinition.apiextensions.k8s.io/workflowtaskresults.argoproj.io unchanged
customresourcedefinition.apiextensions.k8s.io/workflowtasksets.argoproj.io unchanged
customresourcedefinition.apiextensions.k8s.io/workflowtemplates.argoproj.io unchanged
serviceaccount/argo unchanged
serviceaccount/kubeflow-pipelines-cache unchanged
serviceaccount/kubeflow-pipelines-container-builder unchanged
serviceaccount/kubeflow-pipelines-metadata-writer unchanged
serviceaccount/kubeflow-pipelines-viewer unchanged
serviceaccount/meta-controller-service unchanged
serviceaccount/metadata-grpc-server unchanged
serviceaccount/ml-pipeline unchanged
serviceaccount/ml-pipeline-persistenceagent unchanged
serviceaccount/ml-pipeline-scheduledworkflow unchanged
serviceaccount/ml-pipeline-ui unchanged
serviceaccount/ml-pipeline-viewer-crd-service-account unchanged
serviceaccount/ml-pipeline-visualizationserver unchanged
serviceaccount/mysql unchanged
serviceaccount/pipeline-runner unchanged
serviceaccount/seaweedfs unchanged
role.rbac.authorization.k8s.io/argo-role unchanged
role.rbac.authorization.k8s.io/kubeflow-pipelines-cache-role unchanged
role.rbac.authorization.k8s.io/kubeflow-pipelines-metadata-writer-role unchanged
role.rbac.authorization.k8s.io/ml-pipeline unchanged
role.rbac.authorization.k8s.io/ml-pipeline-persistenceagent-role unchanged
role.rbac.authorization.k8s.io/ml-pipeline-scheduledworkflow-role unchanged
role.rbac.authorization.k8s.io/ml-pipeline-ui unchanged
role.rbac.authorization.k8s.io/ml-pipeline-viewer-controller-role unchanged
role.rbac.authorization.k8s.io/pipeline-runner unchanged
clusterrole.rbac.authorization.k8s.io/aggregate-to-kubeflow-pipelines-edit unchanged
clusterrole.rbac.authorization.k8s.io/aggregate-to-kubeflow-pipelines-view unchanged
clusterrole.rbac.authorization.k8s.io/argo-aggregate-to-admin unchanged
clusterrole.rbac.authorization.k8s.io/argo-aggregate-to-edit unchanged
clusterrole.rbac.authorization.k8s.io/argo-aggregate-to-view unchanged
clusterrole.rbac.authorization.k8s.io/argo-cluster-role unchanged
clusterrole.rbac.authorization.k8s.io/kubeflow-metacontroller unchanged
clusterrole.rbac.authorization.k8s.io/kubeflow-pipelines-cache-role unchanged
clusterrole.rbac.authorization.k8s.io/kubeflow-pipelines-edit configured
clusterrole.rbac.authorization.k8s.io/kubeflow-pipelines-metadata-writer-role unchanged
clusterrole.rbac.authorization.k8s.io/kubeflow-pipelines-view configured
clusterrole.rbac.authorization.k8s.io/ml-pipeline unchanged
clusterrole.rbac.authorization.k8s.io/ml-pipeline-persistenceagent-role unchanged
clusterrole.rbac.authorization.k8s.io/ml-pipeline-scheduledworkflow-role unchanged
clusterrole.rbac.authorization.k8s.io/ml-pipeline-ui unchanged
clusterrole.rbac.authorization.k8s.io/ml-pipeline-viewer-controller-role unchanged
rolebinding.rbac.authorization.k8s.io/argo-binding unchanged
rolebinding.rbac.authorization.k8s.io/kubeflow-pipelines-cache-binding unchanged
rolebinding.rbac.authorization.k8s.io/kubeflow-pipelines-metadata-writer-binding unchanged
rolebinding.rbac.authorization.k8s.io/ml-pipeline unchanged
rolebinding.rbac.authorization.k8s.io/ml-pipeline-persistenceagent-binding unchanged
rolebinding.rbac.authorization.k8s.io/ml-pipeline-scheduledworkflow-binding unchanged
rolebinding.rbac.authorization.k8s.io/ml-pipeline-ui unchanged
rolebinding.rbac.authorization.k8s.io/ml-pipeline-viewer-crd-binding unchanged
rolebinding.rbac.authorization.k8s.io/pipeline-runner-binding unchanged
clusterrolebinding.rbac.authorization.k8s.io/argo-binding unchanged
clusterrolebinding.rbac.authorization.k8s.io/kubeflow-pipelines-cache-binding unchanged
clusterrolebinding.rbac.authorization.k8s.io/kubeflow-pipelines-metadata-writer-binding unchanged
clusterrolebinding.rbac.authorization.k8s.io/meta-controller-cluster-role-binding unchanged
clusterrolebinding.rbac.authorization.k8s.io/ml-pipeline unchanged
clusterrolebinding.rbac.authorization.k8s.io/ml-pipeline-persistenceagent-binding unchanged
clusterrolebinding.rbac.authorization.k8s.io/ml-pipeline-scheduledworkflow-binding unchanged
clusterrolebinding.rbac.authorization.k8s.io/ml-pipeline-ui unchanged
clusterrolebinding.rbac.authorization.k8s.io/ml-pipeline-viewer-crd-binding unchanged
configmap/kfp-launcher unchanged
configmap/kubeflow-pipelines-profile-controller-code-42d46f9h7d unchanged
configmap/kubeflow-pipelines-profile-controller-env-5252m69c4c unchanged
configmap/metadata-envoy-configmap unchanged
configmap/metadata-grpc-configmap unchanged
configmap/ml-pipeline-ui-configmap unchanged
configmap/pipeline-api-server-config-dc9hkg52h6 unchanged
configmap/pipeline-install-config unchanged
configmap/workflow-controller-configmap unchanged
secret/mlpipeline-minio-artifact configured
secret/mysql-secret configured
service/cache-server unchanged
service/kubeflow-pipelines-profile-controller unchanged
service/metadata-envoy-service unchanged
service/metadata-grpc-service unchanged
service/minio-service unchanged
service/ml-pipeline unchanged
service/ml-pipeline-ui unchanged
service/ml-pipeline-visualizationserver unchanged
service/mysql unchanged
service/seaweedfs unchanged
priorityclass.scheduling.k8s.io/workflow-controller unchanged
persistentvolumeclaim/mysql-pv-claim unchanged
persistentvolumeclaim/seaweedfs-pvc unchanged
deployment.apps/cache-server configured
deployment.apps/kubeflow-pipelines-profile-controller unchanged
deployment.apps/metadata-envoy-deployment unchanged
deployment.apps/metadata-grpc-deployment unchanged
deployment.apps/metadata-writer configured
deployment.apps/ml-pipeline configured
deployment.apps/ml-pipeline-persistenceagent configured
deployment.apps/ml-pipeline-scheduledworkflow configured
deployment.apps/ml-pipeline-ui configured
deployment.apps/ml-pipeline-viewer-crd configured
deployment.apps/ml-pipeline-visualizationserver unchanged
deployment.apps/mysql unchanged
deployment.apps/seaweedfs unchanged
deployment.apps/workflow-controller unchanged
statefulset.apps/metacontroller configured
certificate.cert-manager.io/kfp-cache-cert unchanged
issuer.cert-manager.io/kfp-cache-selfsigned-issuer unchanged
decoratorcontroller.metacontroller.k8s.io/kubeflow-pipelines-profile-controller created
destinationrule.networking.istio.io/metadata-grpc-service unchanged
destinationrule.networking.istio.io/ml-pipeline unchanged
destinationrule.networking.istio.io/ml-pipeline-mysql unchanged
destinationrule.networking.istio.io/ml-pipeline-seaweedfs unchanged
destinationrule.networking.istio.io/ml-pipeline-ui unchanged
destinationrule.networking.istio.io/ml-pipeline-visualizationserver unchanged
virtualservice.networking.istio.io/metadata-grpc unchanged
virtualservice.networking.istio.io/ml-pipeline-ui unchanged
networkpolicy.networking.k8s.io/seaweedfs configured
authorizationpolicy.security.istio.io/metadata-grpc-service unchanged
authorizationpolicy.security.istio.io/ml-pipeline unchanged
authorizationpolicy.security.istio.io/ml-pipeline-ui unchanged
authorizationpolicy.security.istio.io/ml-pipeline-visualizationserver unchanged
authorizationpolicy.security.istio.io/mysql unchanged
authorizationpolicy.security.istio.io/seaweedfs-service unchanged
authorizationpolicy.security.istio.io/service-cache-server unchanged
mutatingwebhookconfiguration.admissionregistration.k8s.io/cache-webhook-kubeflow configured

이 스탭에서 꽤 오랜시간을 잡아먹는다. Pods 상태를 확인하면서 대기타자.

kubectl get pod -n kubeflow

.. 한 35분 걸리더라 ㅠㅠ

Pipeline Definitions Stored as Kubernetes Resources

(파이프라인 정의는 Kubernetes 리소스로 저장됩니다)

Kubeflow Pipelines는 Kubernetes 네이티브 API 모드로 배포할 수 있으며, 이 모드에서는 파이프라인 정의가 외부 스토리지를 사용하는 대신 Kubernetes 사용자 정의 리소스(PipelinePipelineVersion종류)로 저장됩니다. 이 모드는 Kubernetes 네이티브 도구 및 GitOps 워크플로와의 통합을 향상시켜 줍니다.

Using the KFP SDK with Kubernetes Native API Mode

(Kubernetes 네이티브 API 모드에서 KFP SDK 사용하기)

자세한 파이프라인 컴파일 지침은 Kubeflow 파이프라인 컴파일 가이드를 참조하십시오.

Kubernetes 네이티브 API 모드의 차이점:

  • 파이프라인 정의는 Kubernetes에서 사용자 정의 리소스 Pipeline로 저장됩니다 .PipelineVersion
  • 파이프라인 유효성 검사는 Kubernetes 승인 웹훅을 통해 처리됩니다.
  • REST API는 Kubernetes API 호출로의 변환을 투명하게 처리합니다.

Kubernetes 네이티브 모드의 장점:

  • 이 방식은 Kubernetes 네이티브 워크플로우를 선호하고 표준 Kubernetes 도구 및 방식을 사용하여 파이프라인을 관리하려는 조직에 이상적입니다.
  • 파이프라인 정의는 kubectl 명령어, Kubeflow Pipelines REST API, 사용자 친화적인 파이프라인 관리 도구인 KFP UI 등 다양한 인터페이스를 통해 관리할 수 있습니다.

KServe

INFORMATION

KFServing은 KServe로 브랜드명을 변경했습니다.

KServe는 모델 서빙(Model Serving) 전용 컴포넌트입니다.

  • 학습된 ML 모델을 프로덕션 환경에 배포
  • Auto-scaling, canary deployment 등 제공
  • REST/gRPC API 엔드포인트 생성
  • KServe가 필요한 경우:
    • 학습한 모델을 API로 서비스해야 할 때
    • 프로덕션 모델 배포 인프라가 필요할 때

KServe 컴포넌트 설치:

kustomize build applications/kserve/kserve | kubectl apply --server-side --force-conflicts -f -

모델 웹 애플리케이션(Models web application)을 설치:

kustomize build applications/kserve/models-web-app/overlays/kubeflow | kubectl apply -f -
  • ClusterServingRuntime 이 없어서 "ensure CRDs are installed first" 이거 많이 뜨더라.
  • clusterservingruntime.kserve-webhook-server.validator 웹훅 실패도 출력됨

여러번 요청해서 마지막 성공한 각각의 출력 결과:

## kustomize build applications/kserve/kserve | kubectl apply --server-side --force-conflicts -f -
customresourcedefinition.apiextensions.k8s.io/clusterservingruntimes.serving.kserve.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/clusterstoragecontainers.serving.kserve.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/inferencegraphs.serving.kserve.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/inferenceservices.serving.kserve.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/localmodelcaches.serving.kserve.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/localmodelnodegroups.serving.kserve.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/localmodelnodes.serving.kserve.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/servingruntimes.serving.kserve.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/trainedmodels.serving.kserve.io serverside-applied
serviceaccount/kserve-controller-manager serverside-applied
serviceaccount/kserve-localmodel-controller-manager serverside-applied
serviceaccount/kserve-localmodelnode-agent serverside-applied
role.rbac.authorization.k8s.io/kserve-leader-election-role serverside-applied
clusterrole.rbac.authorization.k8s.io/kserve-localmodel-manager-role serverside-applied
clusterrole.rbac.authorization.k8s.io/kserve-localmodelnode-agent-role serverside-applied
clusterrole.rbac.authorization.k8s.io/kserve-manager-role serverside-applied
clusterrole.rbac.authorization.k8s.io/kserve-proxy-role serverside-applied
clusterrole.rbac.authorization.k8s.io/kubeflow-kserve-admin serverside-applied
clusterrole.rbac.authorization.k8s.io/kubeflow-kserve-edit serverside-applied
clusterrole.rbac.authorization.k8s.io/kubeflow-kserve-view serverside-applied
rolebinding.rbac.authorization.k8s.io/kserve-leader-election-rolebinding serverside-applied
clusterrolebinding.rbac.authorization.k8s.io/kserve-localmodel-manager-rolebinding serverside-applied
clusterrolebinding.rbac.authorization.k8s.io/kserve-localmodelnode-agent-rolebinding serverside-applied
clusterrolebinding.rbac.authorization.k8s.io/kserve-manager-rolebinding serverside-applied
clusterrolebinding.rbac.authorization.k8s.io/kserve-proxy-rolebinding serverside-applied
configmap/inferenceservice-config serverside-applied
secret/kserve-webhook-server-secret serverside-applied
service/kserve-controller-manager-metrics-service serverside-applied
service/kserve-controller-manager-service serverside-applied
service/kserve-webhook-server-service serverside-applied
deployment.apps/kserve-controller-manager serverside-applied
deployment.apps/kserve-localmodel-controller-manager serverside-applied
certificate.cert-manager.io/serving-cert serverside-applied
issuer.cert-manager.io/selfsigned-issuer serverside-applied
clusterservingruntime.serving.kserve.io/kserve-huggingfaceserver serverside-applied
clusterservingruntime.serving.kserve.io/kserve-huggingfaceserver-multinode serverside-applied
clusterservingruntime.serving.kserve.io/kserve-lgbserver serverside-applied
clusterservingruntime.serving.kserve.io/kserve-mlserver serverside-applied
clusterservingruntime.serving.kserve.io/kserve-paddleserver serverside-applied
clusterservingruntime.serving.kserve.io/kserve-pmmlserver serverside-applied
clusterservingruntime.serving.kserve.io/kserve-sklearnserver serverside-applied
clusterservingruntime.serving.kserve.io/kserve-tensorflow-serving serverside-applied
clusterservingruntime.serving.kserve.io/kserve-torchserve serverside-applied
clusterservingruntime.serving.kserve.io/kserve-tritonserver serverside-applied
clusterservingruntime.serving.kserve.io/kserve-xgbserver serverside-applied
clusterstoragecontainer.serving.kserve.io/default serverside-applied
mutatingwebhookconfiguration.admissionregistration.k8s.io/inferenceservice.serving.kserve.io serverside-applied
validatingwebhookconfiguration.admissionregistration.k8s.io/clusterservingruntime.serving.kserve.io serverside-applied
validatingwebhookconfiguration.admissionregistration.k8s.io/inferencegraph.serving.kserve.io serverside-applied
validatingwebhookconfiguration.admissionregistration.k8s.io/inferenceservice.serving.kserve.io serverside-applied
validatingwebhookconfiguration.admissionregistration.k8s.io/localmodelcache.serving.kserve.io serverside-applied
validatingwebhookconfiguration.admissionregistration.k8s.io/servingruntime.serving.kserve.io serverside-applied
validatingwebhookconfiguration.admissionregistration.k8s.io/trainedmodel.serving.kserve.io serverside-applied

## kustomize build applications/kserve/models-web-app/overlays/kubeflow | kubectl apply -f -
serviceaccount/kserve-models-web-app created
clusterrole.rbac.authorization.k8s.io/kserve-models-web-app-cluster-role created
clusterrolebinding.rbac.authorization.k8s.io/kserve-models-web-app-binding created
configmap/kserve-models-web-app-config created
service/kserve-models-web-app created
deployment.apps/kserve-models-web-app created
virtualservice.networking.istio.io/kserve-models-web-app created
authorizationpolicy.security.istio.io/kserve-models-web-app created

Katib

하이퍼파라미터 튜닝

kustomize build applications/katib/upstream/installs/katib-with-kubeflow | kubectl apply -f -

출력:

customresourcedefinition.apiextensions.k8s.io/experiments.kubeflow.org created
customresourcedefinition.apiextensions.k8s.io/suggestions.kubeflow.org created
customresourcedefinition.apiextensions.k8s.io/trials.kubeflow.org created
serviceaccount/katib-controller created
serviceaccount/katib-ui created
clusterrole.rbac.authorization.k8s.io/katib-controller created
clusterrole.rbac.authorization.k8s.io/katib-ui created
clusterrole.rbac.authorization.k8s.io/kubeflow-katib-admin created
clusterrole.rbac.authorization.k8s.io/kubeflow-katib-edit created
clusterrole.rbac.authorization.k8s.io/kubeflow-katib-view created
clusterrolebinding.rbac.authorization.k8s.io/katib-controller created
clusterrolebinding.rbac.authorization.k8s.io/katib-ui created
configmap/katib-config created
configmap/trial-templates created
secret/katib-mysql-secrets created
service/katib-controller created
service/katib-db-manager created
service/katib-mysql created
service/katib-ui created
persistentvolumeclaim/katib-mysql created
deployment.apps/katib-controller created
deployment.apps/katib-db-manager created
deployment.apps/katib-mysql created
deployment.apps/katib-ui created
certificate.cert-manager.io/katib-webhook-cert created
issuer.cert-manager.io/katib-selfsigned-issuer created
virtualservice.networking.istio.io/katib-ui created
authorizationpolicy.security.istio.io/katib-ui created
mutatingwebhookconfiguration.admissionregistration.k8s.io/katib.kubeflow.org created
validatingwebhookconfiguration.admissionregistration.k8s.io/katib.kubeflow.org created

Central Dashboard

Install the Central Dashboard official Kubeflow component:

kustomize build applications/centraldashboard/overlays/oauth2-proxy | kubectl apply -f -

출력:

serviceaccount/centraldashboard created
role.rbac.authorization.k8s.io/centraldashboard created
clusterrole.rbac.authorization.k8s.io/centraldashboard created
rolebinding.rbac.authorization.k8s.io/centraldashboard created
clusterrolebinding.rbac.authorization.k8s.io/centraldashboard created
configmap/centraldashboard-config created
configmap/centraldashboard-parameters created
service/centraldashboard created
deployment.apps/centraldashboard created
virtualservice.networking.istio.io/centraldashboard created
authorizationpolicy.security.istio.io/central-dashboard created

Admission Webhook

Install the Admission Webhook for PodDefaults:

kustomize build applications/admission-webhook/upstream/overlays/cert-manager | kubectl apply -f -

출력:

$ kustomize build applications/admission-webhook/upstream/overlays/cert-manager | kubectl apply -f -
# Warning: 'bases' is deprecated. Please use 'resources' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
# Warning: 'commonLabels' is deprecated. Please use 'labels' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
# Warning: 'patchesStrategicMerge' is deprecated. Please use 'patches' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
# Warning: 'vars' is deprecated. Please use 'replacements' instead. [EXPERIMENTAL] Run 'kustomize edit fix' to update your Kustomization automatically.
# Warning: 'commonLabels' is deprecated. Please use 'labels' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
# Warning: 'vars' is deprecated. Please use 'replacements' instead. [EXPERIMENTAL] Run 'kustomize edit fix' to update your Kustomization automatically.
customresourcedefinition.apiextensions.k8s.io/poddefaults.kubeflow.org created
serviceaccount/admission-webhook-service-account created
clusterrole.rbac.authorization.k8s.io/admission-webhook-cluster-role created
clusterrole.rbac.authorization.k8s.io/admission-webhook-kubeflow-poddefaults-admin created
clusterrole.rbac.authorization.k8s.io/admission-webhook-kubeflow-poddefaults-edit created
clusterrole.rbac.authorization.k8s.io/admission-webhook-kubeflow-poddefaults-view created
clusterrolebinding.rbac.authorization.k8s.io/admission-webhook-cluster-role-binding created
service/admission-webhook-service created
deployment.apps/admission-webhook-deployment created
certificate.cert-manager.io/admission-webhook-cert created
issuer.cert-manager.io/admission-webhook-selfsigned-issuer created
mutatingwebhookconfiguration.admissionregistration.k8s.io/admission-webhook-mutating-webhook-configuration created

Notebooks 1.0

Notebook Controller 공식 Kubeflow 구성 요소를 설치하세요.

kustomize build applications/jupyter/notebook-controller/upstream/overlays/kubeflow | kubectl apply -f -

출력:

# Warning: 'patchesStrategicMerge' is deprecated. Please use 'patches' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
# Warning: 'bases' is deprecated. Please use 'resources' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
# Warning: 'commonLabels' is deprecated. Please use 'labels' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
# Warning: 'patchesJson6902' is deprecated. Please use 'patches' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
# Warning: 'patchesStrategicMerge' is deprecated. Please use 'patches' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
customresourcedefinition.apiextensions.k8s.io/notebooks.kubeflow.org created
serviceaccount/notebook-controller-service-account created
role.rbac.authorization.k8s.io/notebook-controller-leader-election-role created
clusterrole.rbac.authorization.k8s.io/notebook-controller-kubeflow-notebooks-admin created
clusterrole.rbac.authorization.k8s.io/notebook-controller-kubeflow-notebooks-edit created
clusterrole.rbac.authorization.k8s.io/notebook-controller-kubeflow-notebooks-view created
clusterrole.rbac.authorization.k8s.io/notebook-controller-role created
rolebinding.rbac.authorization.k8s.io/notebook-controller-leader-election-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/notebook-controller-role-binding created
configmap/notebook-controller-config-fhm9f7tdt5 created
service/notebook-controller-service created
deployment.apps/notebook-controller-deployment created

Kubeflow 공식 Jupyter 웹 애플리케이션 구성 요소를 설치하세요.

kustomize build applications/jupyter/jupyter-web-app/upstream/overlays/istio | kubectl apply -f -

출력:

# Warning: 'commonLabels' is deprecated. Please use 'labels' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
# Warning: 'commonLabels' is deprecated. Please use 'labels' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
# Warning: 'vars' is deprecated. Please use 'replacements' instead. [EXPERIMENTAL] Run 'kustomize edit fix' to update your Kustomization automatically.
serviceaccount/jupyter-web-app-service-account created
role.rbac.authorization.k8s.io/jupyter-web-app-jupyter-notebook-role created
clusterrole.rbac.authorization.k8s.io/jupyter-web-app-cluster-role created
clusterrole.rbac.authorization.k8s.io/jupyter-web-app-kubeflow-notebook-ui-admin created
clusterrole.rbac.authorization.k8s.io/jupyter-web-app-kubeflow-notebook-ui-edit created
clusterrole.rbac.authorization.k8s.io/jupyter-web-app-kubeflow-notebook-ui-view created
rolebinding.rbac.authorization.k8s.io/jupyter-web-app-jupyter-notebook-role-binding created
clusterrolebinding.rbac.authorization.k8s.io/jupyter-web-app-cluster-role-binding created
configmap/jupyter-web-app-config-9c2fbg2gdc created
configmap/jupyter-web-app-logos created
configmap/jupyter-web-app-parameters-48gf6bbhmk created
service/jupyter-web-app-service created
deployment.apps/jupyter-web-app-deployment created
destinationrule.networking.istio.io/jupyter-web-app created
virtualservice.networking.istio.io/jupyter-web-app-jupyter-web-app created
authorizationpolicy.security.istio.io/jupyter-web-app created

Workspaces (Notebooks 2.0)

This feature is still in development. <- 적어도 v1.11.0 에서는 개발중이라더라

PVC Viewer

Kubeflow 공식 구성 요소인 PVC Viewer Controller를 설치하세요.

kustomize build applications/pvcviewer-controller/upstream/base | kubectl apply -f -

출력:

# Warning: 'commonLabels' is deprecated. Please use 'labels' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
# Warning: 'patchesStrategicMerge' is deprecated. Please use 'patches' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
# Warning: 'vars' is deprecated. Please use 'replacements' instead. [EXPERIMENTAL] Run 'kustomize edit fix' to update your Kustomization automatically.
# Warning: 'patchesStrategicMerge' is deprecated. Please use 'patches' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
customresourcedefinition.apiextensions.k8s.io/pvcviewers.kubeflow.org created
serviceaccount/pvcviewer-controller-manager created
role.rbac.authorization.k8s.io/pvcviewer-leader-election-role created
clusterrole.rbac.authorization.k8s.io/pvcviewer-metrics-reader created
clusterrole.rbac.authorization.k8s.io/pvcviewer-proxy-role created
clusterrole.rbac.authorization.k8s.io/pvcviewer-role created
rolebinding.rbac.authorization.k8s.io/pvcviewer-leader-election-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/pvcviewer-manager-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/pvcviewer-proxy-rolebinding created
service/pvcviewer-controller-manager-metrics-service created
service/pvcviewer-webhook-service created
deployment.apps/pvcviewer-controller-manager created
certificate.cert-manager.io/pvcviewer-serving-cert created
issuer.cert-manager.io/pvcviewer-selfsigned-issuer created
mutatingwebhookconfiguration.admissionregistration.k8s.io/pvcviewer-mutating-webhook-configuration created
validatingwebhookconfiguration.admissionregistration.k8s.io/pvcviewer-validating-webhook-configuration created

Profiles + KFAM

Kubeflow 공식 구성 요소인 Profile Controller와 Kubeflow Access-Management(KFAM)를 설치합니다.

kustomize build applications/profiles/upstream/overlays/kubeflow | kubectl apply -f -

출력:

# Warning: 'commonLabels' is deprecated. Please use 'labels' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
# Warning: 'patchesStrategicMerge' is deprecated. Please use 'patches' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
# Warning: 'vars' is deprecated. Please use 'replacements' instead. [EXPERIMENTAL] Run 'kustomize edit fix' to update your Kustomization automatically.
# Warning: 'patchesStrategicMerge' is deprecated. Please use 'patches' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
# Warning: 'bases' is deprecated. Please use 'resources' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
# Warning: 'commonLabels' is deprecated. Please use 'labels' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
# Warning: 'patchesStrategicMerge' is deprecated. Please use 'patches' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
Warning: Detected changes to resource profiles.kubeflow.org which is currently being deleted.
customresourcedefinition.apiextensions.k8s.io/profiles.kubeflow.org configured
serviceaccount/profiles-controller-service-account created
role.rbac.authorization.k8s.io/profiles-leader-election-role created
rolebinding.rbac.authorization.k8s.io/profiles-leader-election-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/profiles-cluster-rolebinding created
configmap/namespace-labels-data-4df5t8mdgf created
configmap/profiles-config-5h9m86f79f created
service/profiles-kfam created
deployment.apps/profiles-deployment created
virtualservice.networking.istio.io/profiles-kfam created
authorizationpolicy.security.istio.io/profiles-kfam created

WARNING

"customresourcedefinition.apiextensions.k8s.io/profiles.kubeflow.org" 가 "created" 가 출력되어야 한다. 그 때 까지 계속 호출~ 안그럼 나중에 #User Namespaces 에서 에러나더라

Volumes Web Application

Kubeflow 공식 구성 요소인 Volumes 웹 애플리케이션을 설치하세요.

kustomize build applications/volumes-web-app/upstream/overlays/istio | kubectl apply -f -

출력:

# Warning: 'commonLabels' is deprecated. Please use 'labels' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
# Warning: 'commonLabels' is deprecated. Please use 'labels' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
# Warning: 'vars' is deprecated. Please use 'replacements' instead. [EXPERIMENTAL] Run 'kustomize edit fix' to update your Kustomization automatically.
serviceaccount/volumes-web-app-service-account created
clusterrole.rbac.authorization.k8s.io/volumes-web-app-cluster-role created
clusterrole.rbac.authorization.k8s.io/volumes-web-app-kubeflow-volume-ui-admin created
clusterrole.rbac.authorization.k8s.io/volumes-web-app-kubeflow-volume-ui-edit created
clusterrole.rbac.authorization.k8s.io/volumes-web-app-kubeflow-volume-ui-view created
clusterrolebinding.rbac.authorization.k8s.io/volumes-web-app-cluster-role-binding created
configmap/volumes-web-app-parameters-mbftc78hbk created
configmap/volumes-web-app-viewer-spec-gm954c98h6 created
service/volumes-web-app-service created
deployment.apps/volumes-web-app-deployment created
destinationrule.networking.istio.io/volumes-web-app created
virtualservice.networking.istio.io/volumes-web-app-volumes-web-app created
authorizationpolicy.security.istio.io/volumes-web-app created

Tensorboard Web Application

Tensorboards 웹 애플리케이션 공식 Kubeflow 구성 요소를 설치하세요.

kustomize build applications/tensorboard/tensorboards-web-app/upstream/overlays/istio | kubectl apply -f -

출력:

# Warning: 'commonLabels' is deprecated. Please use 'labels' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
# Warning: 'commonLabels' is deprecated. Please use 'labels' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
# Warning: 'vars' is deprecated. Please use 'replacements' instead. [EXPERIMENTAL] Run 'kustomize edit fix' to update your Kustomization automatically.
serviceaccount/tensorboards-web-app-service-account created
clusterrole.rbac.authorization.k8s.io/tensorboards-web-app-cluster-role created
clusterrole.rbac.authorization.k8s.io/tensorboards-web-app-kubeflow-tensorboard-ui-admin created
clusterrole.rbac.authorization.k8s.io/tensorboards-web-app-kubeflow-tensorboard-ui-edit created
clusterrole.rbac.authorization.k8s.io/tensorboards-web-app-kubeflow-tensorboard-ui-view created
clusterrolebinding.rbac.authorization.k8s.io/tensorboards-web-app-cluster-role-binding created
configmap/tensorboards-web-app-parameters-6ffg2tt572 created
service/tensorboards-web-app-service created
deployment.apps/tensorboards-web-app-deployment created
destinationrule.networking.istio.io/tensorboards-web-app created
virtualservice.networking.istio.io/tensorboards-web-app-tensorboards-web-app created
authorizationpolicy.security.istio.io/tensorboards-web-app created

Tensorboard Controller

Tensorboard Controller 공식 Kubeflow 구성 요소를 설치하세요.

kustomize build applications/tensorboard/tensorboard-controller/upstream/overlays/kubeflow | kubectl apply -f -

출력:

# Warning: 'patchesStrategicMerge' is deprecated. Please use 'patches' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
# Warning: 'patchesStrategicMerge' is deprecated. Please use 'patches' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
# Warning: 'bases' is deprecated. Please use 'resources' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
# Warning: 'commonLabels' is deprecated. Please use 'labels' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
# Warning: 'patchesStrategicMerge' is deprecated. Please use 'patches' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
customresourcedefinition.apiextensions.k8s.io/tensorboards.tensorboard.kubeflow.org created
serviceaccount/tensorboard-controller-controller-manager created
role.rbac.authorization.k8s.io/tensorboard-controller-leader-election-role created
clusterrole.rbac.authorization.k8s.io/tensorboard-controller-manager-role created
clusterrole.rbac.authorization.k8s.io/tensorboard-controller-metrics-reader created
clusterrole.rbac.authorization.k8s.io/tensorboard-controller-proxy-role created
rolebinding.rbac.authorization.k8s.io/tensorboard-controller-leader-election-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/tensorboard-controller-manager-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/tensorboard-controller-proxy-rolebinding created
configmap/tensorboard-controller-config-7hd244gf2d created
service/tensorboard-controller-controller-manager-metrics-service created
deployment.apps/tensorboard-controller-deployment created

Training Operator

Training Operator v1 (v1.10.2)

Kubeflow 공식 구성 요소인 Training Operator를 설치하세요.

kustomize build applications/training-operator/upstream/overlays/kubeflow | kubectl apply --server-side --force-conflicts -f -

출력:

customresourcedefinition.apiextensions.k8s.io/jaxjobs.kubeflow.org serverside-applied
customresourcedefinition.apiextensions.k8s.io/mpijobs.kubeflow.org serverside-applied
customresourcedefinition.apiextensions.k8s.io/paddlejobs.kubeflow.org serverside-applied
customresourcedefinition.apiextensions.k8s.io/pytorchjobs.kubeflow.org serverside-applied
customresourcedefinition.apiextensions.k8s.io/tfjobs.kubeflow.org serverside-applied
customresourcedefinition.apiextensions.k8s.io/xgboostjobs.kubeflow.org serverside-applied
serviceaccount/training-operator serverside-applied
clusterrole.rbac.authorization.k8s.io/kubeflow-training-admin serverside-applied
clusterrole.rbac.authorization.k8s.io/kubeflow-training-edit serverside-applied
clusterrole.rbac.authorization.k8s.io/kubeflow-training-view serverside-applied
clusterrole.rbac.authorization.k8s.io/training-operator serverside-applied
clusterrolebinding.rbac.authorization.k8s.io/training-operator serverside-applied
secret/training-operator-webhook-cert serverside-applied
service/training-operator serverside-applied
deployment.apps/training-operator serverside-applied
validatingwebhookconfiguration.admissionregistration.k8s.io/validator.training-operator.kubeflow.org serverside-applied

Trainer (Training Operator v2; v1.11.0)

Kubeflow 공식 구성 요소인 Trainer(Training Operator v2)를 설치하세요.

kustomize build applications/trainer/upstream/overlays/kubeflow-platform | kubectl apply --server-side --force-conflicts -f -
# kustomize build applications/training-operator/upstream/overlays/kubeflow | kubectl apply --server-side --force-conflicts -f -

"ensure CRDs are installed first" 출력된다.... 알지?

첫 번째 출력:

customresourcedefinition.apiextensions.k8s.io/clustertrainingruntimes.trainer.kubeflow.org serverside-applied
customresourcedefinition.apiextensions.k8s.io/jobsets.jobset.x-k8s.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/trainingruntimes.trainer.kubeflow.org serverside-applied
customresourcedefinition.apiextensions.k8s.io/trainjobs.trainer.kubeflow.org serverside-applied
serviceaccount/jobset-controller-manager serverside-applied
serviceaccount/kubeflow-trainer-controller-manager serverside-applied
role.rbac.authorization.k8s.io/jobset-leader-election-role serverside-applied
clusterrole.rbac.authorization.k8s.io/jobset-manager-role serverside-applied
clusterrole.rbac.authorization.k8s.io/jobset-metrics-reader serverside-applied
clusterrole.rbac.authorization.k8s.io/jobset-proxy-role serverside-applied
clusterrole.rbac.authorization.k8s.io/kubeflow-trainer-admin serverside-applied
clusterrole.rbac.authorization.k8s.io/kubeflow-trainer-controller-manager serverside-applied
clusterrole.rbac.authorization.k8s.io/kubeflow-trainer-edit serverside-applied
clusterrole.rbac.authorization.k8s.io/kubeflow-trainer-view serverside-applied
clusterrole.rbac.authorization.k8s.io/kubeflow-trainer-view-cluster-runtimes serverside-applied
rolebinding.rbac.authorization.k8s.io/jobset-leader-election-rolebinding serverside-applied
clusterrolebinding.rbac.authorization.k8s.io/jobset-manager-rolebinding serverside-applied
clusterrolebinding.rbac.authorization.k8s.io/jobset-metrics-reader-rolebinding serverside-applied
clusterrolebinding.rbac.authorization.k8s.io/jobset-proxy-rolebinding serverside-applied
clusterrolebinding.rbac.authorization.k8s.io/kubeflow-trainer-controller-manager serverside-applied
clusterrolebinding.rbac.authorization.k8s.io/kubeflow-trainer-view-cluster-runtimes serverside-applied
configmap/jobset-manager-config serverside-applied
configmap/kubeflow-trainer-config serverside-applied
secret/jobset-webhook-server-cert serverside-applied
secret/kubeflow-trainer-webhook-cert serverside-applied
service/jobset-controller-manager-metrics-service serverside-applied
service/jobset-webhook-service serverside-applied
service/kubeflow-trainer-controller-manager serverside-applied
deployment.apps/jobset-controller-manager serverside-applied
deployment.apps/kubeflow-trainer-controller-manager serverside-applied
mutatingwebhookconfiguration.admissionregistration.k8s.io/jobset-mutating-webhook-configuration serverside-applied
validatingwebhookconfiguration.admissionregistration.k8s.io/jobset-validating-webhook-configuration serverside-applied
validatingwebhookconfiguration.admissionregistration.k8s.io/validator.trainer.kubeflow.org serverside-applied
resource mapping not found for name: "deepspeed-distributed" namespace: "kubeflow" from "STDIN": no matches for kind "ClusterTrainingRuntime" in version "trainer.kubeflow.org/v1alpha1"
ensure CRDs are installed first
resource mapping not found for name: "mlx-distributed" namespace: "kubeflow" from "STDIN": no matches for kind "ClusterTrainingRuntime" in version "trainer.kubeflow.org/v1alpha1"
ensure CRDs are installed first
resource mapping not found for name: "torch-distributed" namespace: "kubeflow" from "STDIN": no matches for kind "ClusterTrainingRuntime" in version "trainer.kubeflow.org/v1alpha1"
ensure CRDs are installed first
resource mapping not found for name: "torchtune-llama3.2-1b" namespace: "kubeflow" from "STDIN": no matches for kind "ClusterTrainingRuntime" in version "trainer.kubeflow.org/v1alpha1"
ensure CRDs are installed first
resource mapping not found for name: "torchtune-llama3.2-3b" namespace: "kubeflow" from "STDIN": no matches for kind "ClusterTrainingRuntime" in version "trainer.kubeflow.org/v1alpha1"
ensure CRDs are installed first
resource mapping not found for name: "torchtune-qwen2.5-1.5b" namespace: "kubeflow" from "STDIN": no matches for kind "ClusterTrainingRuntime" in version "trainer.kubeflow.org/v1alpha1"
ensure CRDs are installed first

#failed calling webhook "validator.clustertrainingruntime.trainer.kubeflow.org" 이슈 발생되었다. 해결한 후 다시 실행한 최종 결과 출력:

customresourcedefinition.apiextensions.k8s.io/clustertrainingruntimes.trainer.kubeflow.org serverside-applied
customresourcedefinition.apiextensions.k8s.io/jobsets.jobset.x-k8s.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/trainingruntimes.trainer.kubeflow.org serverside-applied
customresourcedefinition.apiextensions.k8s.io/trainjobs.trainer.kubeflow.org serverside-applied
serviceaccount/jobset-controller-manager serverside-applied
serviceaccount/kubeflow-trainer-controller-manager serverside-applied
role.rbac.authorization.k8s.io/jobset-leader-election-role serverside-applied
clusterrole.rbac.authorization.k8s.io/jobset-manager-role serverside-applied
clusterrole.rbac.authorization.k8s.io/jobset-metrics-reader serverside-applied
clusterrole.rbac.authorization.k8s.io/jobset-proxy-role serverside-applied
clusterrole.rbac.authorization.k8s.io/kubeflow-trainer-admin serverside-applied
clusterrole.rbac.authorization.k8s.io/kubeflow-trainer-controller-manager serverside-applied
clusterrole.rbac.authorization.k8s.io/kubeflow-trainer-edit serverside-applied
clusterrole.rbac.authorization.k8s.io/kubeflow-trainer-view serverside-applied
clusterrole.rbac.authorization.k8s.io/kubeflow-trainer-view-cluster-runtimes serverside-applied
rolebinding.rbac.authorization.k8s.io/jobset-leader-election-rolebinding serverside-applied
clusterrolebinding.rbac.authorization.k8s.io/jobset-manager-rolebinding serverside-applied
clusterrolebinding.rbac.authorization.k8s.io/jobset-metrics-reader-rolebinding serverside-applied
clusterrolebinding.rbac.authorization.k8s.io/jobset-proxy-rolebinding serverside-applied
clusterrolebinding.rbac.authorization.k8s.io/kubeflow-trainer-controller-manager serverside-applied
clusterrolebinding.rbac.authorization.k8s.io/kubeflow-trainer-view-cluster-runtimes serverside-applied
configmap/jobset-manager-config serverside-applied
configmap/kubeflow-trainer-config serverside-applied
secret/jobset-webhook-server-cert serverside-applied
secret/kubeflow-trainer-webhook-cert serverside-applied
service/jobset-controller-manager-metrics-service serverside-applied
service/jobset-webhook-service serverside-applied
service/kubeflow-trainer-controller-manager serverside-applied
deployment.apps/jobset-controller-manager serverside-applied
deployment.apps/kubeflow-trainer-controller-manager serverside-applied
clustertrainingruntime.trainer.kubeflow.org/deepspeed-distributed serverside-applied
clustertrainingruntime.trainer.kubeflow.org/mlx-distributed serverside-applied
clustertrainingruntime.trainer.kubeflow.org/torch-distributed serverside-applied
clustertrainingruntime.trainer.kubeflow.org/torchtune-llama3.2-1b serverside-applied
clustertrainingruntime.trainer.kubeflow.org/torchtune-llama3.2-3b serverside-applied
clustertrainingruntime.trainer.kubeflow.org/torchtune-qwen2.5-1.5b serverside-applied
mutatingwebhookconfiguration.admissionregistration.k8s.io/jobset-mutating-webhook-configuration serverside-applied
validatingwebhookconfiguration.admissionregistration.k8s.io/jobset-validating-webhook-configuration serverside-applied
validatingwebhookconfiguration.admissionregistration.k8s.io/validator.trainer.kubeflow.org serverside-applied

Spark Operator

Spark Operator를 설치하세요:

INFORMATION

실험 폴더에 있는 Ray 구성 요소는 Istio CNI와의 호환성을 보장하기 위해 헤드 및 워커 포드에 대해 Istio 사이드카 주입을 비활성화하도록 구성되어 있습니다.

kustomize build applications/spark/spark-operator/overlays/kubeflow | kubectl apply --server-side --force-conflicts -f -

출력:

customresourcedefinition.apiextensions.k8s.io/scheduledsparkapplications.sparkoperator.k8s.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/sparkapplications.sparkoperator.k8s.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/sparkconnects.sparkoperator.k8s.io serverside-applied
serviceaccount/spark-operator-controller serverside-applied
serviceaccount/spark-operator-webhook serverside-applied
role.rbac.authorization.k8s.io/spark-operator-controller serverside-applied
role.rbac.authorization.k8s.io/spark-operator-webhook serverside-applied
clusterrole.rbac.authorization.k8s.io/spark-operator-controller serverside-applied
clusterrole.rbac.authorization.k8s.io/spark-operator-webhook serverside-applied
clusterrole.rbac.authorization.k8s.io/kubeflow-spark-admin serverside-applied
clusterrole.rbac.authorization.k8s.io/kubeflow-spark-edit serverside-applied
clusterrole.rbac.authorization.k8s.io/kubeflow-spark-view serverside-applied
rolebinding.rbac.authorization.k8s.io/spark-operator-controller serverside-applied
rolebinding.rbac.authorization.k8s.io/spark-operator-webhook serverside-applied
clusterrolebinding.rbac.authorization.k8s.io/spark-operator-controller serverside-applied
clusterrolebinding.rbac.authorization.k8s.io/spark-operator-webhook serverside-applied
service/spark-operator-webhook-svc serverside-applied
deployment.apps/spark-operator-controller serverside-applied
deployment.apps/spark-operator-webhook serverside-applied
mutatingwebhookconfiguration.admissionregistration.k8s.io/spark-operator-webhook serverside-applied
validatingwebhookconfiguration.admissionregistration.k8s.io/spark-operator-webhook serverside-applied

User Namespaces

기본 사용자 (이름은 kubeflow-user-example-com)에 대한 새 네임스페이스를 생성합니다.

kubectl apply -k common/user-namespace/base

## 아래 명령으로 실행하면 `Invalid value: "$(profile-name)"` 같은 에러가 발생한다.
## kustomize build common/user-namespace/base | kubectl apply -f -

출력:

configmap/default-install-config-9h2h2b6hbk unchanged
profile.kubeflow.org/kubeflow-user-example-com created

RFC 1123 관련 에러가 출력 되면 하단의 #a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters 항목 참조.

Connect to Your Kubeflow Cluster

설치 후 모든 Pod가 준비되는 데 시간이 다소 걸릴 수 있습니다. 연결을 시도하기 전에 모든 Pod가 준비되었는지 확인하십시오. 그렇지 않으면 예기치 않은 오류가 발생할 수 있습니다. 모든 Kubeflow 관련 Pod가 준비되었는지 확인하려면 다음 명령어를 사용하십시오.

kubectl get pods -n cert-manager
kubectl get pods -n istio-system
kubectl get pods -n auth
kubectl get pods -n oauth2-proxy
kubectl get pods -n knative-serving
kubectl get pods -n kubeflow
kubectl get pods -n kubeflow-user-example-com

Port-Forward

Kubeflow에 액세스하는 기본 방법은 포트-포워딩 입니다. 이를 통해 환경에 대한 특별한 요구 사항 없이 빠르게 시작할 수 있습니다. 다음 명령을 실행하여 Istio의 Ingress-Gateway를 로컬 8080 포트로 포트 포워딩하세요

kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80

명령어를 실행한 후 다음 단계를 따라 Kubeflow Central 대시보드에 접속할 수 있습니다.

  • 브라우저를 열고 http://localhost:8080/ 주소로 이동하세요. Dex 로그인 화면이 나타날 것입니다.
  • 기본 사용자 자격 증명으로 로그인하세요. 기본 이메일 주소는 [email protected]이고, 기본 비밀번호는 12341234 입니다.

k3s 의 Traefik IngressRoute 로 노출

Kubeflow의 Istio Gateway 서비스를 Traefik IngressRoute 로 노출하는 kubeflow-ingressroute.yaml 파일을 만들자:

우선 apiVersion 을 확인하기 위해 다음명령을 사용:

kubectl api-resources | grep -E 'Middleware|IngressRoute'

Host 이름 kubeflow.mydomain.com 을 Rule 로 등록하고 싶을 때:

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: kubeflow-ingress
  namespace: istio-system
spec:
  entryPoints:
    - web
    - websecure
  routes:
    - match: Host(`kubeflow.mydomain.com`)
      kind: Rule
      services:
        - name: istio-ingressgateway
          port: 80
  tls:
    certResolver: default  # Let's Encrypt 사용 시

Istio Gateway 서비스 확인:

kubectl get svc -n istio-system istio-ingressgateway

출력:

NAME                   TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)                    AGE
istio-ingressgateway   ClusterIP   10.43.51.45   <none>        15021/TCP,80/TCP,443/TCP   6h14m

Traefik 디플로이 작동하나 확인:

kubectl get deploy -n kube-system traefik

출력:

NAME      READY   UP-TO-DATE   AVAILABLE   AGE
traefik   1/1     1            1           6h41m

traefik 서비스 확인:

kubectl get svc -n kube-system traefik

출력:

NAME      TYPE           CLUSTER-IP      EXTERNAL-IP      PORT(S)                      AGE
traefik   LoadBalancer   10.43.176.104   192.168.88.103   80:32235/TCP,443:31309/TCP   6h41m

/etc/hosts 또는 DNS 레코드에 해당 도메인 (kubeflow.mydomain.com) 추가하고 적용하자:

kubectl apply -f kubeflow-ingressroute.yaml

External IP 는 192.168.88.103 이다 해당 인터페이스로 https://kubeflow.mydomain.com/ 으로 접속하면 된다. <- k3s 에서는 HTTPS 를 사용하게 설정되어있다.

부가적으로 필요하다면:

# Check IngressRoute 확인
kubectl get ingressroute -n istio-system

# Check Traefik routes (Traefik dashboard 필요)
kubectl port-forward -n kube-system deployment/traefik 9000:9000

HTTPS 리다이렉트 Middleware (선택사항)

# https-redirect-middleware.yaml
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: redirect-https
  namespace: default
spec:
  redirectScheme:
    scheme: https
    permanent: true

Upgrading and Extending

Kubeflow 플랫폼의 수정 및 현장 업그레이드에 대해서는 고급 사용자를 위해 간략한 설명을 제공합니다.

  • 매니페스트 파일을 직접 편집하지 마세요. Kustomize 오버레이와 컴포넌트를 example.yaml 파일 위에 사용하세요 .
  • 이렇게 하면 새 매니페스트를 참조하고 Kustomize로 빌드한 다음 kubectl apply다시 실행하기만 하면 업그레이드할 수 있습니다.
  • 필요한 경우 오버레이와 구성 요소를 조정해야 할 수도 있습니다.
  • 오래된 리소스를 정리해야 할 수도 있습니다. 그러려면 처음부터 모든 리소스에 레이블을 추가하면 됩니다.
  • 레이블을 사용하면 가지 치기 가능한 리소스를 나열하는 kubectl apply데 사용할 수 있습니다 .--prune--dry-run
  • 때때로 큰 변경 사항이 발생합니다.
    • 예를 들어, 1.9 릴리스에서는 oauth2-proxy로 전환했는데, 이 부분은 추가적인 주의가 필요합니다 (istio-system을 한 번 정리해야 함).
    • 또는 1.9.1 -> 1.10 kubectl delete clusterrolebinding meta-controller-cluster-role-binding
  • 하지만 쿠버네티스에 대한 기본적인 지식만 있다면 업그레이드를 진행할 수 있을 것입니다.

Troubleshooting

ensure CRDs are installed first

error: resource mapping not found for name: "<RESOURCE_NAME>" namespace: "<SOME_NAMESPACE>" from "STDIN": no matches for kind "<CRD_NAME>" in version "<CRD_FULL_NAME>"
ensure CRDs are installed first

이는 kustomization 이 CRD와 CR을 매우 빠르게 적용하는 과정에서 CRD가 아직 생성되지 않았기 때문(Established)입니다. 이에 대한 자세한 내용은 kubernetes/kubectl#1117helm/helm#4925Established 에서 확인할 수 있습니다.

이 오류가 발생하는 경우 구성 요소의 매니페스트를 다시 적용하는 것이 좋습니다.

metadata.annotations: Too long: may not be more than xxx bytes

Error from server (Invalid): error when creating "applications/trainer/overlays": CustomResourceDefinition.apiextensions.k8s.io "clustertrainingruntimes.trainer.kubeflow.org" is invalid: metadata.annotations: Too long: may not be more than 262144 bytes

이 문제는 --server-side 를 사용하면 된다.

Apply failed with 1 conflict

error: Apply failed with 1 conflict: conflict with "clusterrole-aggregation-controller": .rules
Please review the fields above--they currently have other managers. Here
are the ways you can resolve this warning:
* If you intend to manage all of these fields, please re-run the apply
  command with the `--force-conflicts` flag.
* If you do not intend to manage all of the fields, please edit your
  manifest to remove references to the fields that should keep their
  current managers.
* You may co-own fields by updating your manifest to match the existing
  value; in this case, you'll become the manager if the other manager(s)
  stop managing the field (remove it from their configuration).
See https://kubernetes.io/docs/reference/using-api/server-side-apply/#conflicts

적힌대로 --force-conflicts를 사용하면 된다.

Warning: resource namespaces/kubeflow is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply

Warning: resource namespaces/kubeflow is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.

중간에 출력된 경고 메시지 번역은 다음과 같다:

리소스 namespaces/kubeflow에 ``kubectl apply``에 필요한 ``kubectl.kubernetes.io/last-applied-configuration`` 어노테이션이 누락되었습니다. ``kubectl apply``는 ``kubectl create --save-config`` 또는 ``kubectl apply``를 통해 선언적으로 생성된 리소스에만 사용해야 합니다. 누락된 어노테이션은 자동으로 패치됩니다.

자동으로 패치 된다 하니 일단 무시해보자.

Error from server (InternalError): Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator"

#KServe 설치 도중 다음과 같은 에러가 출력된다:

Error from server (InternalError): Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": failed to call webhook: Post "https://kserve-webhook-server-service.kubeflow.svc:443/validate-serving-kserve-io-v1alpha1-clusterservingruntime?timeout=10s": no endpoints available for service "kserve-webhook-server-service"

CRD와 webhook configuration은 성공적으로 적용되었지만, kserve-webhook-server-service의 엔드포인트(실제 pod)가 아직 시작되지 않아서 ClusterServingRuntime 리소스 생성 시 validation이 실패하고 있습니다.

# kserve-controller-manager pod 상태 확인
kubectl get pods -n kubeflow -l control-plane=kserve-controller-manager

# Pod가 Running 상태가 될 때까지 대기
kubectl wait --for=condition=Ready pod -l control-plane=kserve-controller-manager -n kubeflow --timeout=300s

# webhook 서비스의 엔드포인트가 준비되었는지 확인
kubectl get endpoints kserve-webhook-server-service -n kubeflow

여러 Pod들이 Pending 상태에 있네요. kserve-controller-manager가 시작되지 않아서 webhook 엔드포인트가 없는 상황입니다. Pending 상태의 원인을 파악해야 합니다.

# kserve-controller-manager의 Pending 원인 확인
kubectl describe pod kserve-controller-manager-5fbbbcdd64-wgc66 -n kubeflow

# 다른 Pending Pod들도 확인
kubectl describe pod metacontroller-0 -n kubeflow
kubectl describe pod minio-6d486b66cd-czfw5 -n kubeflow

주로 다음과 같은 메시지가 나올 것입니다:

  • Insufficient cpu/memory - 리소스 부족
  • no nodes available to schedule pods - 노드 부족
  • persistentvolumeclaim "xxx" not found - PVC 문제
  • node(s) didn't match Pod's node affinity/selector - 노드 선택 문제

a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters

#User Namespaces 에서 다음과 같은 에러 출력됨:

Error from server (Invalid): Profile.kubeflow.org "$(profile-name)" is invalid: metadata.name: Invalid value: "$(profile-name)": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
Retrying to apply resources

만약 실행한 명령이 다음과 같다면:

kustomize build common/user-namespace/base | kubectl apply -f -

대신

kubectl apply -k .

이렇게 쓰니 해결됐다.

Error from server (NotFound): the server could not find the requested resource (patch profiles.kubeflow.org $(profile-name))

Error from server (NotFound): the server could not find the requested resource (patch profiles.kubeflow.org $(profile-name))

v1.10.2 버전에서 문제점

마지막 상태 검사할 때 kubectl get pod -n kubeflow 명령으로 확인해 보면 몇 가지 문제를 확인할 수 있다.

minio-6d486b66cd-5hn6w                                   1/2     ImagePullBackOff   0             22m
ml-pipeline-65ff55599d-fmbjv                             1/2     CrashLoopBackOff   9 (30s ago)   22m

ml-pipeline-65ff55599d-fmbjv 상태를 확인해 보면 CrashLoopBackOff 이다. 다음 명령으로 이벤트를 확인해보자:

kubectl describe pod ml-pipeline-65ff55599d-fmbjv -n kubeflow

마지막 Events 항목 보면 다음과 같다:

  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  44m                    default-scheduler  Successfully assigned kubeflow/ml-pipeline-65ff55599d-fmbjv to zer0-z97n-gaming-5
  Normal   Pulled     44m (x2 over 44m)      kubelet            Container image "gcr.io/istio-release/proxyv2:1.26.1" already present on machine
  Normal   Created    44m (x2 over 44m)      kubelet            Created container: istio-validation
  Normal   Started    44m (x2 over 44m)      kubelet            Started container istio-validation
  Normal   Pulled     44m                    kubelet            Container image "gcr.io/istio-release/proxyv2:1.26.1" already present on machine
  Normal   Created    44m                    kubelet            Created container: istio-proxy
  Normal   Started    44m                    kubelet            Started container istio-proxy
  Warning  Unhealthy  44m (x3 over 44m)      kubelet            Startup probe failed:
  Normal   Created    42m (x5 over 44m)      kubelet            Created container: ml-pipeline-api-server
  Normal   Started    42m (x5 over 44m)      kubelet            Started container ml-pipeline-api-server
  Warning  BackOff    4m25s (x202 over 44m)  kubelet            Back-off restarting failed container ml-pipeline-api-server in pod ml-pipeline-65ff55599d-fmbjv_kubeflow(543df7c7-3a50-46d1-821d-f40bd9713112)
  Normal   Pulled     2m41s (x14 over 44m)   kubelet            Container image "ghcr.io/kubeflow/kfp-api-server:2.5.0" already present on machine

"Startup probe failed" 정도만 보인다.. 로그를 확인해보자:

kubectl logs ml-pipeline-65ff55599d-fmbjv -n kubeflow -c ml-pipeline-api-server

다음과 같이 출력됨:

I1223 07:11:44.161982       7 client_manager.go:190] Initializing controller client...
I1223 07:11:44.162246       7 client_manager.go:204] Controller client initialized successfully.
I1223 07:11:44.162258       7 client_manager.go:210] Initializing client manager
I1223 07:11:44.162261       7 client_manager.go:211] Initializing DB client...
I1223 07:11:44.162288       7 config.go:58] Config DBConfig.MySQLConfig.ExtraParams not specified, skipping
I1223 07:11:44.363267       7 client_manager.go:214] DB client initialized successfully
I1223 07:11:44.365553       7 client_manager.go:229] Initializing Object store client...
F1223 07:11:44.366346       7 client_manager.go:550] Failed to check if object store bucket exists. Error: 503 Service Unavailable

윗줄 보면 DB 연결은 됐는데 Object Store Bucket 어쩌 저쩌 한다... 오브젝트 스토리지가 활성화되지 않았댄다 (503 에러)

위에 minio-6d486b66cd-5hn6w 상태가 ImagePullBackOff (Docker Image 없음) 인걸 봐도 MinIO 가 문제더라... 킁;;

해당하는 Deployment 를 편집하자:

kubectl edit deployment minio -n kubeflow

구 버전 MinIO 이미지 (gcr.io/ml-pipeline/minio:RELEASE.2019-08-14T20-37-41Z-license-compliance)를 찾아서 최신 이미지 (minio/minio:RELEASE.2023-09-04T19-57-37Z) 로 바꿔주자.

Insufficient cpu

kubectl describe pod ... 으로 Event 부분 확인해보자:

  Warning  FailedScheduling  4m51s  default-scheduler  0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.

저렇게 뜨면 CPU 부족하단 소리니께 얄짤없이 포기하자.

Data Dictionary initialization failed

#Katib 설치 도중 CrashLoopBackOff 가 출력돼서 logs로 확인해 봤다:

2025-12-24 01:49:47+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.44-1.el9 started.
'/var/lib/mysql/mysql.sock' -> '/var/run/mysqld/mysqld.sock'
2025-12-24T01:49:49.514808Z 0 [Warning] [MY-011068] [Server] The syntax '--skip-host-cache' is deprecated and will be removed in a future release. Please use SET GLOBAL host_cache_size=0 instead.
2025-12-24T01:49:49.516480Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.44) starting as process 1
2025-12-24T01:49:49.579215Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2025-12-24T01:49:49.801600Z 1 [ERROR] [MY-012960] [InnoDB] Cannot create redo log files because data files are corrupt or the database was not shut down cleanly after creating the data files.
2025-12-24T01:49:49.801737Z 1 [ERROR] [MY-012930] [InnoDB] Plugin initialization aborted with error Generic error.
2025-12-24T01:49:50.305004Z 1 [ERROR] [MY-010334] [Server] Failed to initialize DD Storage Engine
2025-12-24T01:49:50.315382Z 0 [ERROR] [MY-010020] [Server] Data Dictionary initialization failed.
2025-12-24T01:49:50.315419Z 0 [ERROR] [MY-010119] [Server] Aborting
2025-12-24T01:49:50.319269Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.44)  MySQL Community Server - GPL.

재설치 시도:

kustomize build applications/katib/upstream/installs/katib-with-kubeflow | kubectl delete -f -
kustomize build applications/katib/upstream/installs/katib-with-kubeflow | kubectl apply -f -

그런데도 안되는군...;;

PVC 제거 후 다시 생성 시도:

# Katib MySQL Deployment 일시 정지 (Scale down)
kubectl scale deployment katib-mysql -n kubeflow --replicas=0

# PVC 이름 확인
kubectl get pvc -n kubeflow | grep katib-mysql

# PVC 삭제 (보통 이름이 katib-mysql임)
kubectl delete pvc katib-mysql -n kubeflow

# 다시 배포 (Scale up)
kubectl scale deployment katib-mysql -n kubeflow --replicas=1

describe로 Event 확인해보니:

persistentvolumeclaim "katib-mysql" not found. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.

katib-mysql PVC를 찾아서 적용하자:

kubectl apply -f applications/katib/upstream/components/mysql/pvc.yaml

참고로 내용은 다음과 같다:

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: katib-mysql
  namespace: kubeflow
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

그런데도 안되는군...;;

그래서 직전 버전(v1.10.2)과 차이점을 확인해보니 mysql 이미지 버전이 8.0.29 에서 8.0 으로 바뀌어 있더라.

그래서 이미지 버전을 8.0.29 으로 바꾸고 실행 -> 그래도 안돼서 -> 로그 확인

$ kubectl logs -n kubeflow katib-mysql-7766f7694f-mqmf9
2025-12-24 04:07:42+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.29-1.el8 started.
2025-12-24 04:07:43+00:00 [Note] [Entrypoint]: Initializing database files
2025-12-24T04:07:43.277091Z 0 [System] [MY-013169] [Server] /usr/sbin/mysqld (mysqld 8.0.29) initializing of server in progress as process 23
2025-12-24T04:07:43.402023Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2025-12-24T04:07:56.963758Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
2025-12-24T04:08:29.260370Z 6 [Warning] [MY-010453] [Server] root@localhost is created with an empty password ! Please consider switching off the --initialize-insecure option.
2025-12-24 04:09:27+00:00 [Note] [Entrypoint]: Database files initialized
2025-12-24 04:09:27+00:00 [Note] [Entrypoint]: Starting temporary server
2025-12-24T04:09:27.983274Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.29) starting as process 226
2025-12-24T04:09:28.149113Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2025-12-24T04:09:30.278561Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
2025-12-24T04:09:33.026860Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed.
2025-12-24T04:09:33.026910Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now supported for this channel.
2025-12-24T04:09:33.141083Z 0 [Warning] [MY-011810] [Server] Insecure configuration for --pid-file: Location '/var/run/mysqld' in the path is accessible to all OS users. Consider choosing a different directory.
2025-12-24T04:09:33.323942Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Socket: /var/run/mysqld/mysqlx.sock
2025-12-24T04:09:33.346168Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.29'  socket: '/var/run/mysqld/mysqld.sock'  port: 0  MySQL Community Server - GPL.
2025-12-24 04:09:33+00:00 [Note] [Entrypoint]: Temporary server started.
'/var/lib/mysql/mysql.sock' -> '/var/run/mysqld/mysqld.sock'
Warning: Unable to load '/usr/share/zoneinfo/iso3166.tab' as time zone. Skipping it.
Warning: Unable to load '/usr/share/zoneinfo/leapseconds' as time zone. Skipping it.
  • 04:07:42 - 초기화 시작
  • 04:09:27 - 데이터베이스 파일 초기화 완료 (약 1분 45초 소요)
  • 04:09:33 - Temporary server 시작 (port 0으로)
  • 현재 timezone 데이터 로딩 중 - 아직 진행 중...

이렇게 요약된다. MySQL 8.0 초기화 단계는 다음과 같다:

  • ✅ Database files 초기화
  • ✅ Temporary server 시작
  • 🔄 초기 설정 작업 (timezone, root password 등) ← 현재 여기
  • ⏳ Temporary server 종료
  • ⏳ 실제 MySQL server 시작 (port 3306)

현재 설정한 probe 타이밍이 초기화 시간보다 짧습니다:

  • Readiness probe: initialDelaySeconds: 30 ← 너무 짧음!
  • Liveness probe: initialDelaySeconds: 60 ← 여전히 부족!

180초 (3분) 으로 바꾸니 이제 잘되더라. v1.11.0 에서 수정된 내용은 다음과 같다:

diff --git a/applications/katib/upstream/components/mysql/mysql.yaml b/applications/katib/upstream/components/mysql/mysql.yaml
index 1dbd3d4e..35cc7547 100644
--- a/applications/katib/upstream/components/mysql/mysql.yaml
+++ b/applications/katib/upstream/components/mysql/mysql.yaml
@@ -24,7 +24,7 @@ spec:
         fsGroupChangePolicy: OnRootMismatch
       containers:
         - name: katib-mysql
-          image: mysql:8.0
+          image: mysql:8.0.29
           args:
             - --datadir
             - /var/lib/mysql/datadir
@@ -47,7 +47,7 @@ spec:
                 - "/bin/bash"
                 - "-c"
                 - "mysql -h 127.0.0.1 -D ${MYSQL_DATABASE} -u root -p${MYSQL_ROOT_PASSWORD} -e 'SELECT 1'"
-            initialDelaySeconds: 10
+            initialDelaySeconds: 180
             periodSeconds: 5
             failureThreshold: 10
           livenessProbe:
@@ -56,7 +56,7 @@ spec:
               - "/bin/bash"
               - "-c"
               - "mysql -h 127.0.0.1 -D ${MYSQL_DATABASE} -u root -p${MYSQL_ROOT_PASSWORD} -e 'SELECT 1'"
-            initialDelaySeconds: 10
+            initialDelaySeconds: 180
             periodSeconds: 5
             failureThreshold: 10
           volumeMounts:

failed calling webhook "validator.clustertrainingruntime.trainer.kubeflow.org"

#Trainer (Training Operator v2; v1.11.0) 설치 진행중 다음 명령 치면:

kustomize build applications/trainer/upstream/overlays/kubeflow-platform | kubectl apply --server-side --force-conflicts -f -

다음과 같은 웹훅 타임아웃이 안사라지더라:

Error from server (InternalError): Internal error occurred: failed calling webhook "validator.clustertrainingruntime.trainer.kubeflow.org": failed to call webhook: Post "https://kubeflow-trainer-controller-manager.kubeflow.svc:443/validate-trainer-kubeflow-org-v1alpha1-clustertrainingruntime?timeout=10s": EOF

우선 상태 서비스 상태 확인 ... OK:

$ kubectl get svc -n kubeflow kubeflow-trainer-controller-manager
NAME                                  TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)            AGE
kubeflow-trainer-controller-manager   ClusterIP   10.43.189.218   <none>        8080/TCP,443/TCP   3h1m

Endpoint 확인.

$ kubectl get endpoints -n kubeflow kubeflow-trainer-controller-manager
Warning: v1 Endpoints is deprecated in v1.33+; use discovery.k8s.io/v1 EndpointSlice
NAME                                  ENDPOINTS                         AGE
kubeflow-trainer-controller-manager   10.42.0.73:9443,10.42.0.73:8080   3h2m

Training Operator의 Pod 상태 확인

kubectl get pods -n kubeflow | grep trainer

로그 확인해보니:

{"level":"error","ts":"2025-12-24T01:37:10.40758628Z","logger":"cert-rotation","caller":"rotator/rotator.go:336","msg":"could not refresh CA and server certs","error":"Operation cannot be fulfilled on secrets \"kubeflow-trainer-webhook-cert\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"..."}
{"level":"info","ts":"2025-12-24T01:37:10.426126643Z","logger":"cert-rotation","caller":"rotator/rotator.go:361","msg":"no cert refresh needed"}
{"level":"error","ts":"2025-12-24T01:37:10.426197365Z","logger":"cert-rotation","caller":"rotator/rotator.go:809","msg":"secret is not well-formed, cannot update webhook configurations","error":"Cert secret is not well-formed, missing ca.crt","errorVerbose":"Cert secret is not well-formed, missing ca.crt\n...","stacktrace":"..."}
...
2025/12/24 01:41:30 http: TLS handshake error from 127.0.0.6:52269: EOF
2025/12/24 01:41:30 http: TLS handshake error from 127.0.0.6:43371: EOF

핵심 문제: kubeflow-trainer-webhook-cert Secret에 ca.crt가 누락되어 있어 TLS 인증서가 제대로 설정되지 않았습니다.

Secret의 data 필드에 ca.crt, tls.crt, tls.key가 모두 있는지 확인:

kubectl get secret -n kubeflow kubeflow-trainer-webhook-cert -o yaml

... ??? 뭐지?? 다 있는디? ... 혹시 모르니 지웠다 다시 설치:

kustomize build applications/trainer/upstream/overlays/kubeflow-platform | kubectl delete -f -
kustomize build applications/trainer/upstream/overlays/kubeflow-platform | kubectl apply --server-side --force-conflicts -f -

... ??? 왜 돼요?

See also

Favorite site