Editing
Cluster:Changelog
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== 10.02.2022 === * Zariel has been repaired (twice) and should now be operational again, please test. Taints are active for the node, so make sure to add tolerations, [[Cluster:Compute_nodes|as shown here]]. === 20.01.2022 === * Two new nodes added which are somewhat outdated, but should still be ok for testing and less compute-intensive projects (Imp and Dretch). * One new very powerful node added (Asmodeus, 4x A100 @ 80 GB each). It is currently configured with 8 virtual GPUs at 40 GB each, but if you ever really have need of 80 GB, you can contact me. * Zariel removed since it currently has a hardware failure. Working with nVidia support, no ETA at the moment. * Some of the more powerful nodes (Asmodeus and Vecna) have now been "tainted" so that they can not be used by default with the pod scheduler. The pod has to explicitly "tolerate" the taint in its configuration so these nodes can be used. Please refer to [[Cluster:Compute_nodes|the list of compute nodes]] for more explanations and examples. * Taints will also be added to other nodes, so that by default, you will only be able to be scheduled to the least powerful nodes in the cluster. Please start to update your pod configurations if you have preferred nodes. === 28.12.2021 === * Kubernetes version has been updated to 1.23.1. Please update your kubectl accordingly. * Pod security infrastructure has been migrated from the deprecated PodSecurityPolicy to OPA/Gatekeeper. No changes on your side should be required if everything was configured as intended, but please inform me if there are things you should be allowed to do and can't, or things you can do which should better be forbidden. * All GPU drivers have been updated to the most recent versions available for the respective machines. You might have to migrate to more recent versions of GPU containers. The GPU driver and CUDA version of all compute nodes are now shown on the cluster status page. * Node Zariel is currently not available - the system update broke something and the node did not boot up. I need physical access to the server room, so earliest date to fix it is January 10th. Please be considerate with the number of GPUs you reserve. === 01.02.2021 === * Full cluster rebuild with Kubernetes 1.20.0 * Hostpath volumes for Ceph home directories, shared and dataset storage, and local node data. === 30.11.2020 === * Node Zariel has been added to the cluster. === 15.07.2020 === * Ceph persistent storage cluster added
Summary:
Please note that all contributions to Collective Computational Unit may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
CCU:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Create account
Log in
Namespaces
Page
Discussion
English
Views
Read
Edit
View history
More
Search
Navigation
Collective Computational Unit
Main page
Projects
Tutorials
GPU Cluster
Core Facilitys
Mediawiki
Recent changes
Random page
Help
Tools
What links here
Related changes
Page information