Editing
CCU:GPU Cluster Quick Start
(section)
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Overview == The GPU cluster runs on Kubernetes, which is a container orchestrator. That means that users can run docker containers, which are essentially light-weight virtual machines without the overhead of an operating system, i.e. they mostly make use of the OS of the host machine. The additional layer in between allows the container to bring their own libraries with them, and shields the host OS from interference from the container. The containers are assigned to the host machines automatically, but the user has some options to specify which machine or which kind of machine they want to end up on. There is a global file system which is running on a Ceph cluster, which is mounted on every host. The details are not important for you, but it means that there is plenty of fast NVMe storage available which you can use for your code and datasets. You have to mount the directories which you want to use inside the container. The typical workflow if you want to run your own applications is as follows: # Log in to the cluster and configure kubectl, the command line tool to talk to Kubernetes, to use your login credentials and namespace. # Create a persistent container to access the file systems, and mount the Ceph volumes inside it. Use this container to transfer code and data to the cluster and back. # (optional): Create your own custom container image with special libraries etc. which you need to run your code. # Create a GPU-enabled container based on your own image or one of the ready-made images with Deep Learning toolkits or whatever workload you want to run. # Start your workloads by logging into the container and running your code manually (only good for debugging), or by defining a job script which automatically runs a specified command inside the container until successful completion (recommended). We will cover these points in more detail below.
Summary:
Please note that all contributions to Collective Computational Unit may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
CCU:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Create account
Log in
Namespaces
Project page
Discussion
English
Views
Read
Edit
View history
More
Search
Navigation
Collective Computational Unit
Main page
Projects
Tutorials
GPU Cluster
Core Facilitys
Mediawiki
Recent changes
Random page
Help
Tools
What links here
Related changes
Page information