Editing CCU:GPU Cluster Quick Start (section)

== Create a pod to access the file systems ==

After login and adjusting the kubeconfig to the new cluster and user namespace, you should be able to start your first pod. Create a work directory on your machine, and a file "ubuntu-test-pod.yaml" with the following content:

<syntaxhighlight lang="bash">
apiVersion: v1
kind: Pod
metadata:
  name: ubuntu-test-pod
spec:
  containers:
  - name: ubuntu
    image: ubuntu:20.04
    command: ["sleep", "1d"]
    resources:
      requests:
        cpu: 100m
        memory: 100Mi
      limits:
        cpu: 1
        memory: 1Gi
    volumeMounts:
      - mountPath: /abyss/home
        name: cephfs-home
        readOnly: false
      - mountPath: /abyss/shared
        name: cephfs-shared
        readOnly: false
      - mountPath: /abyss/datasets
        name: cephfs-datasets
        readOnly: true
  volumes:
    - name: cephfs-home
      hostPath:
        path: "/cephfs/abyss/home/<your-username>"
        type: Directory
    - name: cephfs-shared
      hostPath:
        path: "/cephfs/abyss/shared"
        type: Directory
    - name: cephfs-datasets
      hostPath:
        path: "/cephfs/abyss/datasets"
        type: Directory
</syntaxhighlight>

When you run this on the cluster, it will create a pod for you which runs a container using the latest Ubuntu container image, and the ceph filesystems mounted into it. Use the following commands to create the pod and check out its status:

<syntaxhighlight lang="bash">
  > kubectl apply -f ubuntu-test-pod.yaml
  > kubectl get pods
  > kubectl describe pod ubuntu-test-pod
</syntaxhighlight>

Pay close attention to the event messages given at the end of the "describe pod" command, they give hints what might be wrong if the pod does not start up.

When the pod finally gets the status "running", you can log into the container just as in a remote server to obtain a shell prompt. Do this and verify that the filesystems have been mounted successfully:

<syntaxhighlight lang="bash">
> kubectl exec -it ubuntu-test-pod -- /bin/bash
# cd /abyss/home/
# ls
<might already contain stuff which was automatically copied from volumes on the old cluster.
#
</syntaxhighlight>

From within the container, you have access to the internet, can install packages which are still missing, and also copy over your code and data via rsync or pulling it with e.g. git or svn. You can also push stuff into the container from your local machine using kubectl.

<syntaxhighlight lang="bash">
> kubectl cp <my-files> ubuntu-test-pod:/abyss/home/
</syntaxhighlight>

This works also in the other direction to get stuff out of the pod. For more ideas for what you can do with kubectl, which is a powerful and complex tool, please refer to the basic [https://kubernetes.io/docs/reference/kubectl/cheatsheet/ kubectl cheat sheet] or 
a more [https://github.com/dennyzhang/cheatsheet-kubernetes-A4 advanced version here].

The file systems you are mounting into the pod are available on every node in the cluster. The following directories can be used by anyone:

* '''/cephfs/abyss/home/<your-username>''': this is your personal home directory which you can use any way you like.
* '''/cephfs/abyss/shared''': a shared directory where every user has read/write access, so your data is not secure here from manipulation or deletion. To not have total anarchy in this filesystem, please give sensible names and organize in subdirectories. For example, put personal files which you want to make accessible to everyone in "/abyss/shared/users/<username>". Be considerate towards other users. I will monitor how it works out and whether we need more rules here. If you need more private storage shared only between all members of a trusted work group, please contact me.
* '''/cephfs/abyss/datasets''': directory for static datasets, mounted read-only. These are large general-interest datasets for which we only want to store one copy on the filesystem (no separate imagenets for everyone, please). So whenever you have a well-known public dataset in your shared directory which you think is useful to have in the static tree, please contact me and I move it to the read-only region.

In addition, you can use a directory local to each host, which depending on your workload might be much faster than cephfs, but also ties you to a specific machine:

* '''/raid/local-data/<your-username>''': your personal directory on the local SSD raid of the machine. Make sure to set "type: DirectoryOrCreate", at it is not guaranteed to exist yet.

Please refer to [[CCU:Perstistent storage on the Kubernetes cluster|the persistent storage documentation]] for more details.