使用Kubeflow在本地编排TFX管线

问题描述

嘿,我正在研究一个软件包,该软件包会生成用于训练GPT-2的TFX管道(请参见https://github.com/steven-mi/tfx-gpt2)。

我想知道如何将管道本地部署到Kubeflow。有没有这样做的深入指导?

解决方法

我几个月前就在做这个,但是被其他东西拖走了。我使用下面的配方(不是一个脚本)使KFP,TFX和JupyterLab在Google Cloud VM上运行,而IIRC能够部署TFX管道并运行它。我正在将microk8s用于Kubernetes集群。正在进行的工作,但对于这里的价值而言,也许会有所帮助:

sudo apt-get remove docker docker-engine docker.io containerd runc
sudo apt-get update
sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg-agent \
    software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io
sudo groupadd docker
sudo usermod -aG docker ${USER}

# K8s 1.14 is currently recommended for KFP
sudo snap install microk8s --channel=1.14 --classic
sudo snap alias microk8s.kubectl kubectl
sudo usermod -a -G microk8s $USER

(exit and log back in)

docker run -d -p 5000:5000 --restart=always --name registry registry:2

microk8s.enable dns dashboard storage
microk8s.enable kubeflow
export PIPELINE_VERSION=0.2.5
kubectl apply -k github.com/kubeflow/pipelines/manifests/kustomize/base/crds?ref=$PIPELINE_VERSION
kubectl wait --for condition=established --timeout=60s crd/applications.app.k8s.io
kubectl apply -k github.com/kubeflow/pipelines/manifests/kustomize/env/dev?ref=$PIPELINE_VERSION

sudo apt-get install python3-pip
sudo update-alternatives --install /usr/bin/python python /usr/bin/python3.6 1
sudo update-alternatives  --set python /usr/bin/python3.6
sudo update-alternatives --install /usr/bin/pip pip /usr/bin/pip3 1
sudo update-alternatives  --set pip /usr/bin/pip3
pip install --upgrade pip

export PATH=$PATH:~/.local/bin
pip install notebook
pip install jupyterlab

<Make public IP address static>

jupyter notebook --generate-config
Set a password (Optional):
python
from notebook.auth import passwd; passwd()
(remember the password,and save the generated password)

vi ~/.jupyter/jupyter_notebook_config.py
Enable:
    c.NotebookApp.ip = '*'
    c.NotebookApp.open_browser = False
    c.NotebookApp.port = 3389 # for Pantheon (normally 8888)
    c.NotebookApp.password = 'sha:generated password above'

pip install --no-cache-dir --upgrade tfx
git clone https://github.com/tensorflow/tfx.git
mkdir AIHub
cp tfx/docs/tutorials/tfx/template.ipynb AIHub
cd AIHub

(wait about 5-15 minutes)
kubectl describe configmap inverse-proxy-config -n kubeflow | grep googleusercontent.com
jupyter lab &

相关问答

依赖报错 idea导入项目后依赖报错,解决方案:https://blog....
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下...
错误1:gradle项目控制台输出为乱码 # 解决方案:https://bl...
错误还原:在查询的过程中,传入的workType为0时,该条件不起...
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct...