Python
Python is a very widely-used scripting language: see python.org or the Wikipedia article.
Python is popular because it is reasonably easy to read (for some value of ‘easy to read’), because it has a large library (so whatever software problem you want to address, there is a good chance there is a Python library for it), and because it is popular (ie, ‘everyone uses it’).
Python is a free/open-source language.
The notes below are not an introduction to Python, but intended as a collection of fragments of local advice about it. There is lots of ‘getting started’ advice on the web.
Jupyter §
Within the school, we support a JupyterHub server server, which supports Python notebooks. This is a flexible and fairly easy way to get started, but has a few limitations, which you may start to run into if you use Python extensively.
Installation, and versions §
Depending on the OS on your machine, Python may or may not be already installed. If you need to install it, or if the pre-installed version is too old (see below), there are multiple ways of doing so. For various reasons, we currently semi-recommend Anaconda as the simplest way of doing so.
Anaconda installs a complete Python distribution, separately from any pre-installed version (ie, not replacing it), and installs a tool for managing Python packages, which are the bundles which contain libraries and other additional Python functionality. Using Anaconda will probably work OK for you, but it's a bit of a blunt instrument.
Alternatively, you can use a system Python, and install packages into this (see below for the right and wrong ways of doing this). This gives you more control, but it is possible to mess this up and hobble your installed Python.
If you do not have a pre-installed Python, and do not want to use
Anaconda, you can either download it from
python.org, or use your system's
package manager to install it (eg, yum install python34
on CentOS).
You should use Python 3 for all new code unless you have a very
good reason for using Python 2. A system's pre-installed Python may
be version 2, for legacy reasons; you can check with python
--version
. The system Python 3 might be invoked using the command
python3
rather than plain python
.
It doesn't much matter which version of Python 3 you use, within reason. At the time of writing, Python 3.11 is stable, but Python 3.6 or even 3.4 might be the most recent available in a more conservative Linux distribution, and you'll probably get away with that. Newer is generally better, but it's best to stay away from bleeding-edge versions unless you have a strong need to do so (some packages do specify an ‘oldest supported’ Python version), and do so with your eyes wide open.
Python packages, and virtual environments §
The following discussion presumes that you are aiming for command-line usage of Python. It is possible to use virtual environments on Jupyterhub, but doing so is a little more involved.
There is a huge range of Python packages, some are actively developed, some abandoned; sometimes functionality will be supplied by more than one competing package; sometimes packages are incompatible with each other. It's possible, and indeed relatively easy if you're not very careful, to mess up your collection of Python packages by installing mutually incompatible packages.
The usual Python package installer is
pip
.
Here is some (rather opinionated) advice about using pip:
-
Never use pip to install system packages (ie,
sudo pip ...
). Once you install packages it's hard to uninstall them (which you might want to do if you find you have created an incompatibility) without breaking things. Pip does support uninstallation, in principle, but it never works quite as reliably as one might wish, and you can end up with a system which is subtly broken in a hard-to-fix way. -
A nightmare scenario here is when your OS uses Python to manage some system tasks (both macOS and Linux do this to varying extent) and you inadvertently update a crucial library, causing something in your OS to stop working. This sort of thing is not supposed to happen, but it does.
-
Probably don't use pip ‘user’ installs. Even a pip ‘user’ install can cause problems, if you end up with a big sticky mess of packages as above. This is easy to fix, in a crude way, by simply deleting all of the ‘user’ packages and then reinstalling the ones you discover you still need, but this will end up changing the packages available for all of your Python code. Ie, here you're solving the problem by throwing it away and starting again.
-
Instead use (one or more) ‘virtual environments’.
-
When you do use pip within a virtual environment, use it as
python -m pip install ...
, rather thanpip install ...
, since in the former case you can be sure which version of Python you're using (usingwhich python
), amongst possibly multiple installed versions.
A Python ‘virtual environment’ is a collection of Python packages which can be added or removed from command-line visibility. You maintain a virtual environment using the Python venv module.
The advantages of using venvs are:
-
Different projects can use disjoint sets of packages which would otherwise be mutually incompatible.
-
You can install stable and bleeding-edge versions of packages without them conflicting (so that you can test what happens to your monster Python code with the new version, while still having a route back to safety).
-
The set of Python packages, and their versions, are effectively part of your Python program, and you should be aware of, and record, them, as you would record or log other aspects of your scientific work.
If you need to install a collection of packages for a particular
project, say foo
, then do so as follows. Note that the current
venv
package works only with Python 3, so you may need to choose
that version with a particular command, python3
in this example.
% cd path/to/project
% python3 -m venv foo
this will create a directory foo
which will contain the packages for
this project. You can then ‘activate’ that environment with:
% source foo/bin/activate
(foo)% which python
.../foo/bin/python
(foo)% python --version
Python 3.x.x
Note that the prompt changes to remind you that you're using the venv.
After you source the activate
script, the command python
refers to
a version of Python in your virtual environment rather than any
system one. This will be true until you close the current shell (or
terminal window). When you open a new shell (or terminal), the
default python
command will refer to the original, unadulterated,
one, until you active the venv version using source .../foo/bin/activate
.
If you subsequently use pip to install packages:
(foo)% python -m pip install matplotlib
then this uses this ‘venv’ python, and installs the packages within
the venv. Note that there is no --user
flag here.
You can have multiple such venv structures, containing different
versions of package, depending on your requirements. If you mess up
the collection of packages, you can delete the whole lot by simply
deleting this foo
directory and starting again.
If your work is at all sensitive to the collection of packages you use, then you should take care to record that collection of packages as part of the documentation of your scientific work. Do that using pip:
(foo)% python -m pip freeze >requirements.txt
If you preserve this requirements.txt
file, then you can recreate
the identical set of packages in a future venv using:
(foo)% python -m pip install -r requirements.txt
That is, the requirements.txt
file should be checked in to your
project code repository as part of your source code (you do use
version control, don't you?).