Editing
CCU:Perstistent storage on the Kubernetes cluster
(section)
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== The CephFS file system == As explained in the [[CCU:GPU Cluster Quick Start|quick start tutorial]], every user can mount certain local host paths inside their pods, which refer to a global distributed Ceph file system. Reminder, the primary home directory is <syntaxhighlight lang="bash"> /cephfs/abyss/home/<your-username> </syntaxhighlight> This file system is usually quite fast, but only if it is used for workloads it is designed for. It is a distributed storage, where the filesystem metadata is stored in databases on different servers, and the actual content of the files on other ones. This means that metadata access (such as reading file attributes, or on which server to look for a specific file) can be a bottleneck. In effect, the task of reading the metadata for a small file is orders of magnitude more expensive than reading the actual contents of the file itself. This means that performance breaks down dramatically if writing or accessing many small files. In particular, having many small files in a single directory (say >10k) makes any simple filesystem operations such as directory listings take ages, and in particular automated backup jobs might run into problems. '''TL;DR, and this is very important: when using CephFS, make sure to organize your dataset in few large files (e.g. HDF5), and not many small ones ! If you really have to have individual files, then make sure they are stored in subdirectories which do not become too large. ''' For example, if you have a million images of the form abcdef.jpg in a single directory, you better distribute them over a directory tree a/b/c/def.jpg, so that it is only 1000 files per directory. An interesting option if you have a dataset consisting of many small files might be to keep it in a tar archive and mount that archive using [https://github.com/mxmlnkn/ratarmount ratarmount]. If this is not possible for you, then you need to use the local SSD storage on a single node, which for small files is orders of magnitude faster, but you are bound to a particular node (or have to duplicate the data in different local filesystems). See below for details on local filesystems.
Summary:
Please note that all contributions to Collective Computational Unit may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
CCU:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Create account
Log in
Namespaces
Project page
Discussion
English
Views
Read
Edit
View history
More
Search
Navigation
Collective Computational Unit
Main page
Projects
Tutorials
GPU Cluster
Core Facilitys
Mediawiki
Recent changes
Random page
Help
Tools
What links here
Related changes
Page information