Since Python supports a wide range of additional libraries in machine learning or data science research, it is not always possible to install every package on HPC. Also, users sometimes need to use a specific version of Python or its libraries to conduct their research. Therefore, in that case, users can build their own Python version along with a specific library. One of the ways to accomplish this is to use Conda.
Conda¶
Conda as a package manager helps you find and install packages. If you need a package that requires a different version of Python, you do not need to switch to a different environment manager, because conda is also an environment manager.
Availability¶
Software | Version | Dependent Toolchain | Module Load Command |
---|---|---|---|
Miniforge3 | 24.1.2-0 | - | module load Miniforge3/24.1.2-0 |
Anaconda3 | 2023.09-0 | - | module load Anaconda3/2023.09-0 |
Anaconda3 | 5.3.0 | - | module load Anaconda3/5.3.0 |
Warning
Users can use Conda after loading any of the modules mentioned above. However, please note that the Anaconda module may be removed in the future, as Anaconda is no longer free for non-profit academic research at institutions with more than 200 employees. While we have not received any official communication from Anaconda yet, please use the Miniforge3 module instead of Anaconda. You do not need to reinstall or recreate any environment if you have already created a Conda environment with Anaconda. Simply load Miniforge3, and the remaining steps are the same.
Create and Activate a Conda Virtual Environment¶
Load the Miniforge3 Module
module load Miniforge3
Create Environment with conda
¶
To create an environment use the conda create
command. Once the environment is created, you need to use source conda.sh
to activate the path.
To create an environment use conda create --name ENV python=3.9
where ENV
is the name of the environment. You can choose any environment name of your choice.
Activate and Deactivate Conda Environment¶
Once you create an environment, you need to activate the environment to install python packages
Use conda activate ENV
to activate the Conda environment (ENV
is the name of the environment). Following the activation of the conda environment, the name of the environment appears at the left of the hostname in the terminal.
login1-41 ~ >: module load Miniforge3
login1-41 ~ >: source conda.sh
login1-41 ~ >: conda create --name ENV python=3.9
login1-41 ~ >: conda activate ENV
(ENV) login-41 ~ >:
Once you finish the installation of Python packages, deactivate the conda environment using conda deactivate ENV
.
Warning
Please note that you may need to create multiple Conda environments, as some packages may not work in a single environment. For example, if you want to install PyTorch and TensorFlow, it's advisable to create separate environments as sometimes both packages in a single environment can cause errors. To create another environment make sure to deactivate the previous environment by using the conda deactivate
command.
Install Python Packages Via Conda¶
Once Conda environment is activated, you can install packages via conda install package_name
command. For example, if you want to install matplotlib
, you need to use
(ENV) login-41 ~ >: conda install matplotlib
Conda Channel¶
Conda Channel refers to a repository or collection of software packages that are available for installation using Conda. Conda Channels are used to organize and distribute packages, and they play a crucial role in the Conda ecosystem. Channels can be specified using the --channel
or -c
option with the conda install command i.e.
conda install -c channel_name package_name
. In the above example, if you want to specify the channel name to install matplotlib
, you need to use
Note
Since memory and CPU usage are limited, it's better to start an interactive session with the compute node whenever you are installing Python packages via Conda.
(ENV) login-41 ~ >: conda install -c conda-forge matplotlib
matplotlib
from conda-forge
channel which is a community-maintained collection of Conda packages where a wide range of packages contributed by the community are available.
Users can prioritize channels by listing them in a specific order, so that Conda searches channels in the order they are listed, installing the first version of a package that it finds. To list the channels, create a file .condarc
in the $HOME
directory and add the following
auto_activate_base: false
channels:
- conda-forge
.condarc
is that you don't have to mention the channel name every time you install a package. However, please note that you still need to use the channel name if you want to install Python packages that require a specific channel other than the conda-forge
channel.
Examples¶
Here, we provide some examples of how to use conda
to install application
Install TensorFlow with GPU¶
The following example will create a new conda environment based on Python 3.9 and install TensorFlow in the environment.
login1-41 ~ >: module load Miniforge3
login1-41 ~ >: conda create --name tf python=3.9
Collecting package metadata (current_repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /home/g/guest24/.conda/envs/tf
added / updated specs:
- python=3.9
The following packages will be downloaded:
<output snipped>
Proceed ([y]/n)?y
<output snipped>
#
# To activate this environment, use
#
# $ conda activate tf
#
# To deactivate an active environment, use
#
# $ conda deactivate
Activate the new 'tf' environment
login1-41 ~ >: source conda.sh
login1-41 ~ >: conda activate tf
(tf) login-41 ~ >:
(tf) node430-41 ~ >: conda install -c anaconda tensorflow-gpu
Collecting package metadata (current_repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /home/g/guest24/miniconda3/envs/tf
added / updated specs:
- tensorflow-gpu
<output snipped>
The following packages will be SUPERSEDED by a higher-priority channel:
ca-certificates pkgs/main --> anaconda
certifi pkgs/main --> anaconda
openssl pkgs/main --> anaconda
Proceed ([y]/n)?y
<output snipped>
mkl_fft-1.1.0 | 143 KB | ####################################################################################### | 100%
urllib3-1.25.9 | 98 KB | ####################################################################################### | 100%
cudatoolkit-10.1.243 | 513.2 MB | ####################################################################################### | 100%
protobuf-3.12.3 | 711 KB | ####################################################################################### | 100%
blinker-1.4 | 21 KB | ####################################################################################### | 100%
requests-2.24.0 | 54 KB | ####################################################################################### | 100%
werkzeug-1.0.1 | 243 KB | ####################################################################################### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
(tf) login1-41 ~ >: python
Python 3.9.13 (main, Oct 13 2022, 21:15:33)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
tf.gpu.test.py
import tensorflow as tf
if tf.test.gpu_device_name():
print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))
else:
print("Please install GPU version of TF")
Slurm script to submit the job
#!/bin/bash -l
#SBATCH --output=%x.%j.out # %x.%j expands to slurm JobName.JobID
#SBATCH --error=%x.%j.err # prints the error message
#SBATCH --partition=gpu
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --gres=gpu:1
#SBATCH --mem-per-cpu=4000M # Maximum allowable mempry per CPU 4G
#SBATCH --qos=standard
#SBATCH --account=PI_ucid # Replace PI_ucid which the NJIT UCID of PI
#SBATCH --time=71:59:59 # D-HH:MM:SS
# Purge any module loaded by default
module purge > /dev/null 2>&1
module load wulver # Load slurm, easybuild
module load Miniforge3
source conda.sh
conda activate tf
srun python tf.gpu.test.py
Result:
Starting /home/g/guest24/.bash_profile ... standard AFS bash profile
Home directory : /home/g/guest24 is not in AFS -- skipping quota check
On host node430 :
17:14:13 up 1 day, 1:17, 0 users, load average: 0.01, 0.07, 0.06
Your Kerberos ticket and AFS token status
klist: No credentials cache found (filename: /tmp/krb5cc_22967_HvCVvuvMMX)
Kerberos :
AFS :
Loading default modules ...
Create file : "/home/g/guest24/.modules" to customize.
No modules loaded
2020-07-29 17:14:19.047276: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2020-07-29 17:14:19.059941: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2200070000 Hz
2020-07-29 17:14:19.060093: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55ea8ebfdb90 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-07-29 17:14:19.060136: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-07-29 17:14:19.061484: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
<ouput snipped>
2020-07-29 17:14:19.817386: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-29 17:14:19.817392: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-07-29 17:14:19.817397: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2020-07-29 17:14:19.819082: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/device:GPU:0 with 15064 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:02:00.0, compute capability: 6.0)
Default GPU Device: /device:GPU:0
conda deactivate tf
command.
Install PyTorch with GPU¶
- To install PyTorch with GPU, load the
Miniforge3
module as described above and then use the following
conda create --name torch-cuda
source conda.sh
conda activate torch-cuda
conda install -c "nvidia/label/cuda-12.2.0" cuda-toolkit
conda install -c pytorch -c nvidia pytorch torchvision torchaudio pytorch-cuda -y
Note
In the example above, we mentioned the channel name as we intend to install PyTorch and PyTorch-CUDA from a specific channel. For the default channel please see Channels.
- Check the Torch version
python -c "import torch; print( torch.__version__)"
- Check the CUDA version
python -c "import torch; print(torch. version .cuda)"
- Check whether Torch is compiled with CUDA
python -c "import torch; print(torch.cuda. is_available())"
Important
While checking the CUDA version or PyTorch compilation using the commands mentioned above, make sure to start an interactive session on a GPU node; otherwise, the command will not recognize CUDA or the GPU.
- A simple PyTorch test program is given below to check whether PyTorch has been installed properly. Program is called
torch_tensor.py
# -*- coding: utf-8 -*-
import torch
import math
dtype = torch.float
#device = torch.device("cpu") # Uncomment this to run on CPU
device = torch.device("cuda:0") # Uncomment this to run on GPU
# Create random input and output data
x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)
# Randomly initialize weights
a = torch.randn((), device=device, dtype=dtype)
b = torch.randn((), device=device, dtype=dtype)
c = torch.randn((), device=device, dtype=dtype)
d = torch.randn((), device=device, dtype=dtype)
learning_rate = 1e-6
for t in range(2000):
# Forward pass: compute predicted y
y_pred = a + b * x + c * x ** 2 + d * x ** 3
# Compute and print loss
loss = (y_pred - y).pow(2).sum().item()
if t % 100 == 99:
print(t, loss)
# Backprop to compute gradients of a, b, c, d with respect to loss
grad_y_pred = 2.0 * (y_pred - y)
grad_a = grad_y_pred.sum()
grad_b = (grad_y_pred * x).sum()
grad_c = (grad_y_pred * x ** 2).sum()
grad_d = (grad_y_pred * x ** 3).sum()
# Update weights using gradient descent
a -= learning_rate * grad_a
b -= learning_rate * grad_b
c -= learning_rate * grad_c
d -= learning_rate * grad_d
print(f'Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3')
User can use the following job script to run the script.
torch-cuda.submit.sh
#!/bin/bash -l
#SBATCH --job-name=torch_test
#SBATCH --output=%x.%j.out # %x.%j expands to JobName.JobID
#SBATCH --error=%x.%j.err # prints the error message
#SBATCH --partition=gpu
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --gres=gpu:1
#SBATCH --mem-per-cpu=4000M # Maximum allowable mempry per CPU 4G
#SBATCH --qos=standard
#SBATCH --account=PI_ucid # Replace PI_ucid which the NJIT UCID of PI
#SBATCH --time=71:59:59 # D-HH:MM:SS
# Purge any module loaded by default
module purge > /dev/null 2>&1
module load wulver # Load slurm, easybuild
module load Miniforge3
source conda.sh
conda activate torch-cuda
srun python touch_tensor.py
Warning
When working with Python, it is generally advised to avoid mixing package management tools such as pip and conda within the same environment. Pip and Conda manage dependencies differently, and their conflict can lead to compatibility issues and unexpected behavior. Mixing the two can result in an environment where packages installed with one tool may not interact seamlessly with those installed using the other.
Mamba: The Conda Alternative¶
Mamba is a fast, robust, and cross-platform package manager and particularly useful for building complicated environments, where conda
is unable to 'solve' the required set of packages within a reasonable amount of time.
Users can install packages with mamba
in the same way as with conda
.
module load Miniforge3
# create new environment
mamba create --name env_name python numpy pandas
source conda.sh
# install a new package into an existing environment
conda activate env_name
mamba install scipy
Example of Installing PyTorch via mamba¶
module load Miniforge3
conda create --name torch-cuda
source conda.sh
conda activate torch-cuda
mamba install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
This will install pytorch in the torch-cuda
environment.
Warning
Ensure to unload the Mamba module, as when you use Anaconda3
and Mamba
together, Mamba installs the PyTorch packages within the Conda environment. Consequently, the next time you activate this environment, there's no need to load the 'Mamba' module. Only employ 'Mamba' when installing the packages.
Export and Import Conda Environment¶
Exporting and importing Conda environments allows users to capture and reproduce the exact set of dependencies for a project. With Conda, a popular package and environment management system, users can export an environment, including all installed packages, into a YAML file. This file can then be shared or version-controlled. Importing the environment from the YAML file on another system ensures consistent dependencies, making it easier to recreate the development or execution environment.
Tips
When installing Python packages via Conda, ensure that you perform the installation on the compute node rather than the login node. The CPU and memory resources on login nodes are limited, and installing Python packages on the login node can be time-consuming. To avoid this, initiate an tnteractive session with compute node.
Export Conda Environment¶
To export a conda environment to a new directory or a different machine, you need to activate the environment first that you intend to export. Please see Conda environment on how to activate the environment. Once your environment is activated, you can export it to a YAML file:
conda env export > my_environment.yml
name: my_env
channels:
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- _openmp_mutex=5.1=1_gnu
- blas=1.0=mkl
<ouput snipped>
#the last line is the path of the env
prefix: /home/a/abc3/.conda/envs/my_env.
my_environment.yml
file to make sure it has the correct environment name and other settings. The last line of the file specifies the path of the environment.
Once the YAML file is ready, you can transfer the my_environment.yml
file to the new machine or directory where you want to replicate the environment. See cluster file transfer for details on transferring the files to clusters.
Import Environment on New Machine¶
On the new machine, first load Anaconda and initialize conda as before. Then, create the environment from the YAML file:
conda env create -f my_environment.yml
Collecting package metadata (repodata.json): done
Solving environment: done
<ouput snipped>
Downloading and Extracting Packages
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
# $ conda activate my_env
#
# To deactivate an active environment, use
#
# $ conda deactivate
conda activate my_env
where my_env
is the environment name.
You can check your current environments using conda env list
.
Importing to a Different Location¶
If you want to import the conda environment to a different location, use the --prefix
or -p
option
conda env create -f my_environment.yml -p /project/hpcadmins/abc3/conda_env/my_env
conda activate /project/hpcadmins/abc3/conda_env/my_env
(/project/hpcadmins/abc3/conda_env/my_env) abc3@login01:~$ conda env list
# conda environments:
#
base /apps/easybuild/software/Miniforge3/2023.09-0
* /project/hpcadmins/abc3/conda_env/my_env
Warning
It is advisable to use the /project
directory to store the Conda environment rather than using the $HOME
directory. On Wulver, the storage space on $HOME
is limited (50G) and cannot be increased. See Wulver Filesystems for details.
Conda User Commands¶
Task | Command |
---|---|
Activate environment: | conda activate [environment_name] |
Deactivate environment: | conda deactivate [environment_name] |
Show the list of environments: | conda env list |
Delete environment: | conda remove --name [environment_name] --all |
Export environment: | conda env export > [environment_name].yml |
Import environment from YAML: | conda env create -f [environment_name].yml |
Import environment to different location: | conda env create -f [environment_name].yml -p [PATH] |