Preparing Your Own Executable
The compilations are done on the login node, whereas the execution happens on the compute nodes via the scheduler (SLURM).
Note
The compilation and execution must be done with the same libraries and matching version to avoid unexpected results.
Steps:
- Load required modules on the login node.
- Do the compilation.
- Open the job submission script and specify the same modules to be loaded as used while compilation.
- Submit the script.
The directory contains a few sample programs and their sample job submission scripts. The compilation and execution instructions are described in the beginning of the respective files.
The user can copy the directory to his/her home directory and further try compiling and executing these sample codes. The command for copying is as follows:
cp -r /home/apps/Docs/samples/ ~/.
- mm.c - Serial Version of Matrix-Matrix Multiplication of two NxN matrices
- mm_omp.c - Basic OpenMP Version of Matrix-Matrix Multiplication of two NxN matrices
- mm_mpi.c - Basic MPI Version of Matrix-Matrix Multiplication of two NxN matrices
- mm_acc.c - OpenAcc Version of Matrix-Matrix Multiplication of two NxN matrices
- mm_blas.cu - CUDA Matrix Multiplication program using the cuBlas library.
- mm_mkl.c - MKL Matrix Multiplication program.
- laplace_acc.c - OpenACC version of the basic stencil problem.
It is recommended to use the intel compilers since they are better optimized for the hardware.
Compilers
| Compilers | Description | Versions Available |
|---|---|---|
| gcc/gfortran | GNU Compiler (C/C++/Fortran) | 8.5.0, 12.4.0, 13.3.0, 14.2.0 |
| icc/icpc/ifort | Intel Compilers (C/C++/Fortran) | 2021.5.0, 2021.11.0, 2021.10.0, oneapi@2025.0.1, oneapi@2024.2.1 |
| mpicc/mpicxx/mpif90 | Intel-MPIwith GNU compilers (C/C++/Fortran) | 2021.11.0 |
| mpiicc/mpiicpc/mpiifort | Intel-MPIwithIntel compilers (C/C++/Fortran) | 2021.11.0 |
| nvcc | CUDA C Compiler | 12.6.3 |
Optimization Flags
Optimization flags are meant for uniprocessor optimization, wherein, the compiler tries to optimize the program, on the basis of the level of optimization. The optimization flags may also change the precision of output produced from the executable. The optimization flags can be explored more on the respective compiler pages. A few examples are given below.
Intel: -O3 –xHost
GNU: -O3
PGI: -fast
Given next is a brief description of compilation and execution of the various types of programs. However, for certain bigger applications, loading of additional dependency libraries might be required.
C Program:
Setting up of environment:
spack load gcc@13.3.0 /3wdpf6l
spack load intel-oneapi-compilers@2024.0.0
compilation: icc -O3 -xHost <<prog_name.c>>
Execution: ./a.out
C + OpenMP Program:
Setting up of environment:
spack load gcc@13.3.0 /3wdpf6l
spack load intel-oneapi-compilers@2024.2.1
compilation: icc -O3 -xHost <<prog_name.c>>
Execution: ./a.out
C + MPI Program:
Setting up of environment:
spack load gcc@13.3.0 /3wdpf6l
spack load intel-oneapi-compilers@2024.0.0
Compilation: mpiicc -O3 -xHost <<prog_name.c>>
Execution: mpirun -n <<num_procs>> ./a.out
C + MKL Program:
Setting up of environment:
spack load gcc@13.3.0 /3wdpf6l
spack load intel-oneapi-compilers@2024.0.0
Compilation: icc -O3 -xHost -mkl <<prog_name.c>>
Execution: ./a.out
CUDA Program:
Setting up of environment:
spack load gcc /3wdpf6l
spack load cuda /ezhcbzk
Example (1)
Compilation: nvcc -arch=sm_80<<prog_name.cu>>
Execution: ./a.out
Note: The optimization switch -arch=sm_80 is intended for Ampere series A100 GPUs and is valid for CUDA 12 and later. Similarly, older versions of CUDA have compatibility with lower versions of GCC only. Accordingly, appropriate modules of GCC must be loaded.
Example (2)
Compilation: nvcc -arch=sm_80 /home/apps/Docs/samples/mm_blas.cu -lcublas
Execution: ./a.out
CUDA + OpenMP Program:
Setting up of environment:
spack load gcc@13 /3wdpf6l
spack load cuda /ezhcbzk
Example (1)
Compilation: nvcc -arch=sm_80 -Xcompiler="-fopenmp" -lgomp /home/apps/Docs/samples/mm_blas_omp.cu -lcublas
Execution: ./a.out
Example (2)
Compilation: g++ -fopenmp /home/apps/Docs/samples/mm_blas_omp.c -I/home/apps/spack/opt/spack/linux-almalinux8-cascadelake/gcc-12.4.0/cuda-12.6.3-ezhcbzkhxdihcdrq6lp4df3stnmrza4b/include -L/home/apps/spack/opt/spack/linux-almalinux8-cascadelake/gcc-12.4.0/cuda-12.6.3-ezhcbzkhxdihcdrq6lp4df3stnmrza4b/lib64 -lcublas
Execution: ./a.out
OpenACC Program:
Setting up of environment:
spack load pgi@19.10 cuda@10.1
Compilation for GPU: pgcc -acc -fast -Minfo=all -ta=tesla:cc70,managed/home/apps/Docs/samples/laplace_acc.c
Execution:./a.out
Compilation for CPU: pgcc -acc -fast -Minfo=all -ta=multicore-tp=skylake /home/apps/Docs/samples/laplace_acc.c
Execution:./a.out.