Preparing Your Own Executable

The compilations are done on the login node, whereas the execution happens on the compute nodes via the scheduler (SLURM).

Note

The compilation and execution must be done with the same libraries and matching version to avoid unexpected results.

Steps:

Load required modules on the login node.
Do the compilation.
Open the job submission script and specify the same modules to be loaded as used while compilation.
Submit the script.

The directory contains a few sample programs and their sample job submission scripts. The compilation and execution instructions are described in the beginning of the respective files.

The user can copy the directory to his/her home directory and further try compiling and executing these sample codes. The command for copying is as follows:

cp -r /home/apps/Docs/samples/ ~/.

mm.c - Serial Version of Matrix-Matrix Multiplication of two NxN matrices
mm_omp.c - Basic OpenMP Version of Matrix-Matrix Multiplication of two NxN matrices
mm_mpi.c - Basic MPI Version of Matrix-Matrix Multiplication of two NxN matrices
mm_acc.c - OpenAcc Version of Matrix-Matrix Multiplication of two NxN matrices
mm_blas.cu - CUDA Matrix Multiplication program using the cuBlas library.
mm_mkl.c - MKL Matrix Multiplication program.
laplace_acc.c - OpenACC version of the basic stencil problem.

It is recommended to use the intel compilers since they are better optimized for the hardware.

Compilers

Compilers	Description	Versions Available
gcc/gfortran	GNU Compiler (C/C++/Fortran)	8.5.0, 12.4.0, 13.3.0, 14.2.0
icc/icpc/ifort	Intel Compilers (C/C++/Fortran)	2021.5.0, 2021.11.0, 2021.10.0, oneapi@2025.0.1, oneapi@2024.2.1
mpicc/mpicxx/mpif90	Intel-MPIwith GNU compilers (C/C++/Fortran)	2021.11.0
mpiicc/mpiicpc/mpiifort	Intel-MPIwithIntel compilers (C/C++/Fortran)	2021.11.0
nvcc	CUDA C Compiler	12.6.3

Optimization Flags

Optimization flags are meant for uniprocessor optimization, wherein, the compiler tries to optimize the program, on the basis of the level of optimization. The optimization flags may also change the precision of output produced from the executable. The optimization flags can be explored more on the respective compiler pages. A few examples are given below.

Intel:  -O3 –xHost
GNU: -O3
PGI: -fast

Given next is a brief description of compilation and execution of the various types of programs. However, for certain bigger applications, loading of additional dependency libraries might be required.

C Program:

Setting up of environment: 
spack load gcc@13.3.0 /3wdpf6l
spack load intel-oneapi-compilers@2024.0.0
compilation: icc -O3 -xHost <<prog_name.c>>
Execution: ./a.out

C + OpenMP Program:

Setting up of environment: 
spack load gcc@13.3.0 /3wdpf6l
spack load intel-oneapi-compilers@2024.2.1
compilation: icc -O3 -xHost <<prog_name.c>>
Execution: ./a.out

C + MPI Program:

Setting up of environment:  
spack load gcc@13.3.0 /3wdpf6l
spack load intel-oneapi-compilers@2024.0.0
Compilation: mpiicc -O3 -xHost <<prog_name.c>>
Execution: mpirun -n <<num_procs>> ./a.out

C + MKL Program:

Setting up of environment:
spack load gcc@13.3.0 /3wdpf6l
spack load intel-oneapi-compilers@2024.0.0
Compilation: icc -O3 -xHost -mkl <<prog_name.c>>
Execution: ./a.out

CUDA Program:

Setting up of environment: 
spack load gcc /3wdpf6l
spack load cuda /ezhcbzk

Example (1)
Compilation: nvcc -arch=sm_80<<prog_name.cu>>
Execution: ./a.out 


Note: The optimization switch -arch=sm_80 is intended for Ampere series A100 GPUs and is valid for CUDA 12 and later. Similarly, older versions of CUDA have compatibility with lower versions of GCC only. Accordingly, appropriate modules of GCC must be loaded. 




Example (2)
Compilation: nvcc -arch=sm_80 /home/apps/Docs/samples/mm_blas.cu -lcublas
Execution: ./a.out

CUDA + OpenMP Program:

Setting up of environment: 
spack load gcc@13 /3wdpf6l
spack load cuda /ezhcbzk


Example (1)
Compilation: nvcc -arch=sm_80 -Xcompiler="-fopenmp" -lgomp /home/apps/Docs/samples/mm_blas_omp.cu -lcublas
Execution: ./a.out 




Example (2)
Compilation: g++ -fopenmp /home/apps/Docs/samples/mm_blas_omp.c -I/home/apps/spack/opt/spack/linux-almalinux8-cascadelake/gcc-12.4.0/cuda-12.6.3-ezhcbzkhxdihcdrq6lp4df3stnmrza4b/include  -L/home/apps/spack/opt/spack/linux-almalinux8-cascadelake/gcc-12.4.0/cuda-12.6.3-ezhcbzkhxdihcdrq6lp4df3stnmrza4b/lib64 -lcublas
Execution: ./a.out

OpenACC Program:

Setting up of environment: 
spack load pgi@19.10 cuda@10.1

Compilation for GPU: pgcc -acc -fast -Minfo=all -ta=tesla:cc70,managed/home/apps/Docs/samples/laplace_acc.c
Execution:./a.out


Compilation for CPU: pgcc -acc -fast -Minfo=all -ta=multicore-tp=skylake /home/apps/Docs/samples/laplace_acc.c
Execution:./a.out.