Data, research and otherwise
Data storage in the school is managed in a hierarchy of cost/security.
-
/home
-- on reliable storage, and additionally backed up carefully. This is the most expensive storage, so we make periodic attempts to monitor how much space individuals use here, and chat with those using too much (where ‘too much’ is a deliberately vague quantity). -
/data
-- on reliable storage, but backed up only on a case-by-case basis (because this tends to be larger volume). -
/scratch
-- spare space on machines, primarily to support high IO bandwidth for the benefit of programs running on the same machine. The expectation is that data is staged to a compute machine for input, the output is written locally, for speed, and the data is then retrieved afterwards. This space is not configured for reliability, so is vulnerable to disk failures (though these should be unlikely in practice). We do not at present routinely delete data in/scratch
, but you should act as if we did.
Here ‘reliable storage’ means storage configured to be able to survive disk failures without problems (ie, this is a type of RAID storage). There are catastrophic circumstances where such data can be lost – for example if a whole machine were destroyed, or physically stolen – but disk failure, as the most common mode for data loss, generally shouldn't happen.
Availability §
/home
: Everyone with a school SSO account will have /home
space.
This is where you store papers, theses, software, and all the most
high-value material. Because this is the most expensive category of
storage, we tend to keep a fairly close eye on it.
/data
: We can create /data
storage for particular projects, for
clusters of people, or sometimes for individuals with large storage
needs. We refer to all of these as ‘projects’, and create space
/data/pNNN
with a numeric ID. We can if necessary create unix
user-groups to control access. Our /data
spaces are in the one to 10s
of TB range (we're not really set up for much larger storage volumes).
If you need such space, contact P&A IT support.
/scratch
: Every machine should end up with ‘scratch’ space, as part of the
process of commissioning the machine, with the expectation that this
is all of the otherwise-unused space available on the hardware. For
convenience, most machines are additionally configured so that the
scratch space is visible to other local machines, as
/scratch/<machinename>
– contact P&A IT support if this appears not
to be the case for your machine.
Scratch space is usually set up with a directory owned by each regular user of a machine, but the space is managed cooperatively by the users on that machine (ie, talk to each other!).
See also:
- Research Data Management (RDM)