Python on Sherlock
Python is well supported on Sherlock. Its implementation may vary based on user preferences and requirements. Options for using Python on Sherlock include
- Sherlock modules
- Sherlock modules + virtual environments
- Ana-, Mini-, or some other Conda
- Build your own!
Note also that, for reasons related to the “base” version of Python, used by Sherlock’s CentOSX operating system and in order to back-support some older codes, the “default” version of Python is v2.7
and even when Python modules are loaded the command python
will often refer to user/bin/python
, which is the system’s “base” python@2.7
. In order to run the desired “module loaded” python, use the python3
command, eg:
[sh02-01n58 ~] (job 57166857) $ module load python/3.12
[sh02-01n58 ~] (job 57166857) $ which python
/usr/bin/python
[sh02-01n58 ~] (job 57166857) $ which python3
/share/software/user/open/python/3.12.1/bin/python3
Python versions
Software packages will often specify a version requirement for Python, among other dependencies. Generally speaking, and perhaps especiall for Python, these requirements should be considered subjectively and – in most cases, not taken too literally. Remember that – especially in the domains of scientific and research computing, that software is written and maintained by people who are better descibed as domain experts than software engineers, and whose time and priorities might be more focused on scientific objective, publicatiopns, and grant proposals than refining and maintaing software. This is even more true with respect to documentation and often with respect to version specification. As often as not, a version “requirement” is simply a statement that, “This code worked with v.x.y.z
at least once.
When considering a specified version requirement, consider:
- When was the documentation written? Was
v.x.y.z
the most current at the time of documentation? - Are there known, significant new or depricated features in a newer version of the dependency SW (eg,
Python
)? - Are there known syntax changes?
- Generally, how strictly should this version requirement be considered?
- What kind of codes are you running? Do they tend to defer to older, “stable” versions of SW, or do they tend to be on the bleeding edge.
Opinions on the matter will vary, but there are strong arguments in favor of trying to stay on the leading, not trailing, edge of software versioning. Especially for software with complex dependency graphs, older versions of software can quickly go “stale.” Even when named or known dependencies are explicitly satisfied, it is easy to miss second order dependencies (“dependencies of dependencies”), which can cause problems. These issues can often be mitigated by controlling the factors that are in our control – updating our codes, and trying to keep pace and up to date with changing (usually improving…) code bases. Generally speaking, a good practice is often to start with the most current version of any software (python
…), and work your way back to lower versions, if necessary.
Sherlock modules
Sherlock supports several versions of Python, which are periodically updated to add newer and remove older versions. As usual, a good start is,
module spider python/
This is certainly the quickest way to access a basic, stripped down version of Python on Sherlock. For relatively simple implementations of Python – ie, jobs that require only a few python libraries, this is by itself likely a good option.
Loading python library modules
Supporting python modules, like numpy
or pandas
are prefixed with py-
and then suffixed with the appropriate python version. Again, module spider
is a great place to start, eg.
module spider py-numpy
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
py-numpy:
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Description:
NumPy is the fundamental package for scientific computing with Python.
Versions:
py-numpy/1.14.3_py27 (math)
py-numpy/1.14.3_py36 (math)
py-numpy/1.17.2_py36 (math)
py-numpy/1.18.1_py36 (math)
py-numpy/1.19.2_py36 (math)
py-numpy/1.20.3_py39 (math)
py-numpy/1.24.2_py39 (math)
py-numpy/1.26.3_py312 (math)
Note that, for example, py-numpy/1.26.3_py312
is the correct module to use with python/3.12
. Note that these modules are written with an upstream hierarchy – the LMOD script for py-numpy/1.26.3_py312
includes a depends_on('python/3.12')
requirement, so:
- You can skip to the
numpy
load;module load py-numpy/1.26.3_py312
will loadpython/3.12
as an upstream dependency. - Be careful to load the correct
py-
module(s), since they might rearrange some other versioning choices you might have made.module load py-numpy
will load the default version ofpy-numpy
(whatever that is…), including thepython/
and other supporting modules.
Anaconda
Anaconda, Miniconda, etc. can be installed on Sherlock. The process is identical to installing on a personal machine – laptop, lab server, etc., except you will need to specify an install location. Since Conda installations can get quite large, especially for machine learning (ML) applications, we recommend that you not install into your $HOME
space – since this space on Sherlock is limited to only 15 GB
. Instead, install to your $GROUP_HOME
space, which has a 1TB
quota.
Note that you will need to provide the full path to to where you want Anaconda, Miniconda, etc. to install; the installer will not create a directory called – for example, anaconda
in the path your provide it. A common approach is to nest user subdirectories in the $GROUP_HOME
space. In this case, a common install prefix for Anaconda would be,
/home/groups/$GROUP/$USER/anaconda
Note that one issue with Anaconda is that it has a tendency to produce large numbers of small files and also to be “greedy” in the way it manages software – it will install many very similar versions of a package to match requirements, and so it can take up a lot of space. These factors can be problematic in an HPC environment, and can also adversely affect performance, so it is generally recommended that Python modules (see above) and virutal environments (see below) be considered in lioe
Python
It is easy enough to just install Python yourself, “the regular way.” An excellent trick to compiling Python to work well on Sherlock is to load an existing Python module (plus a good compiler), eg module load gcc/12.4 python/3.12
to build Python. This will ensure that certain libraries, for example libffi/
, libressl/
are available. Ideally then, you might write an LMOD module script, based on the python/3.12
module in this example, to load that version of Python.
As with many other tasks, the best way to get started here is to web query (or “Google”) for “install python,” or similar prompt. A few things to remember up front:
- Specify a
--prefix
location to install the SW, during the configure phase, eg../configure --prefix=$HOME/local/python
- Review other
./configure
options, in the documentationa and by using./configure --help
- The
./configure
script might recommend some options at the end of its process; you might want to re-run with those options
Virtual Environments
Virtual environments are an easy way to build Python applications or workflows, with complicated dependency graphs, without interfering or conflicting with other Python applications or workflows, that also have complicated dependency graphs. Several software packages provide virtual environment functionality, including conda
. In this section, we focus on virtualenv
, also known as venv
.
Install Virtualenv
Virtualenv is installed on most, if not all, Sherlock Python installations; if you have compiled your own Python, it might be necessary to install it yourself. This is done the standard way,
pip install virtualenv
or
pip install --user virtualenv
if Python is installed to a non-writeable (by you) disk space.
Creating and activating an environment
The general syntax to create an environment called myenv
is,
python3 -m venv myenv
Note first that python3
not the more familar python
may be necessary on Sherlock, as discussed above. Note also that this creates the environment in the current, local path. To create a collection of environments, for example in your $GROUP_HOME
spacek, consider something like:
mkdir -p ${GROUP_HOME}/${USER}/python_envs
python3 -m venv ${GROUP_HOME}/${USER}/python_envs/venv_1
python3 -m venv ${GROUP_HOME}/${USER}/python_envs/venv_2
...
An enviroment is then activated by “source” running the activate
script in the environment’s .../bin
directory, eg.
. ${GROUP_HOME}/${USER}/python_envs/venv_1/bin/activate
or
source ${GROUP_HOME}/${USER}/python_envs/venv_2/bin/activate
To deactivate the enviornment, and fall back to the base
(not-)environment, use the deactivate
command. To remove the environment, simple delete the directory – eg,
rm -rf ${GROUP_HOME}/${USER}/python_envs/venv_1
Note also, be VERY CAREFUL with rm -rf
. The -rf
flags tell the rm
command to be ‘recursive’ (walk down the directory tree), and -f
means “force,” which means that it will delete files and directoreis with “read-only” flags, or similar weak protections. In short, do not point rm -rf
at a path you do not want to delete entirely.
Using venv
in Jypyter
Virtual environments can be used in Jupyter Notebook or Jupyter Lab two ways. For the first method, simply activate the environment, then launch Jupyter,
. ${ENVS_PATH}/myenv/bin/activate
jupyter notebook
Unfortunatly, this method cannot be used with OnDemand, and if not now, it is not unlikely that some functionality in Jupyter might by default execute a deactivate
command during instantiation, so it is likely worth understanding a more explicit approach.
The second approach is to ‘register’ the environment with IPython Kernel. If necessary, install ipykernel:
pip install ipykernel
If you are using a Sherlock Python module,
module spider py-ipython
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
py-ipython:
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Description:
IPython is a command shell for interactive computing in multiple programming languages, originally developed for the Python programming language.
Versions:
py-ipython/5.4.1_py27 (devel)
py-ipython/6.1.0_py36 (devel)
py-ipython/8.3.0_py39 (devel)
py-ipython/8.22.2_py312 (devel)
Then select the appropriate version – for example, for Python 3.9
module load py-ipython/8.3.0_py39
module load py-jupyter/1.0.0_py39
Then to add an environment to ipython
,
python3 -m ipykernel install --user --name=${ENVS_PATH}/myenv