Editing
Tutorials:Run the example container on the cluster
(section)
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Set up a Kubernetes job script == Download the [[File:Kubernetes_samples.zip|Kubernetes samples]] and look at the kubernetes subdirectory in example_1. Check out "make_config.sh" and run it after you have set the bash environment variable "KUBERNETES_USER" to your cluster username: <syntaxhighlight lang="bash"> > export KUBERNETES_USER=your.username > ./make_configs.sh </syntaxhighlight> This will create a number of yaml files (Kubernetes configuration files) from the templates in the "template" subdirectory. Check out the first example, "job-script.yaml": <syntaxhighlight lang="yaml"> apiVersion: batch/v1 kind: Job metadata: # name of the job name: your-username-tf-mnist spec: template: spec: # List of containers belonging to the job starts here containers: # container name used for pod creation - name: your-username-tf-mnist-container # container image from the registry image: ccu.uni-konstanz.de:5000/your.username/tf_mnist:0.1 # container resources requested from the node resources: # requests are minimum resource requirements requests: # this gives us a minimum 2 GiB of main memory to work with. memory: "2Gi" # you should allocate at least 1 CPU for machine learning jobs, # usually more if you for example have seperate threads for reading data # 1 CPU unit is 1 CPU core or hyperthread, depending on CPU architecture # Note that these are typically not a scarce resource on our GPU servers, # so you can be a bit generous. cpu: 1 # limits are maximum resource allocations limits: # this gives an absolute limit of 3 GiB of main memory. # exceeding it will mean the container exits immediately with an error. memory: "3Gi" # CPU limit, but pod will usually not be killed for excessive CPU use cpu: 1 # this requests a number of GPUs. GPUs will be allocated to the container # exclusively. No fractional GPUs can be requested. # When executing nvidia-smi in the container, it should show exactly this # number of GPUs. # # PLEASE DO NOT SET THE NUMBER TO ZERO, EVER, AND ALWAYS INCLUDE THIS LINE. # ALWAYS PUT IT IN THE SECTION "limits", NOT "requests". # # It is a known limitation of nVidias runtime that if zero GPUs are requested, # then actually *all* GPUs are exposed in the container. # We are looking for a fix to this. # nvidia.com/gpu: "1" # the command which is executed after container creation command: ["/application/run.sh"] # login credentials to the docker registry. # for convenience, a readonly credential is provided as a secret in each namespace. imagePullSecrets: - name: registry-ro-login # containers will never restart restartPolicy: Never # number of retries after failure. # since we typically have to fix something in this case, set to zero by default. backoffLimit: 0</syntaxhighlight> When we start this job, it will create a single container based on the image we previously uploaded to the registry on a suitable node which serves the selected namespace of the cluster. <syntaxhighlight lang="yaml"> > kubectl apply -f job-script.yaml </syntaxhighlight>
Summary:
Please note that all contributions to Collective Computational Unit may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
CCU:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Create account
Log in
Namespaces
Page
Discussion
English
Views
Read
Edit
View history
More
Search
Navigation
Collective Computational Unit
Main page
Projects
Tutorials
GPU Cluster
Core Facilitys
Mediawiki
Recent changes
Random page
Help
Tools
What links here
Related changes
Page information