Skip to content

Preparing Your Own Executable

The compilations are done on the login node, whereas the execution happens on the compute nodes via the scheduler (SLURM).

Note

The compilation and execution must be done with the same libraries and matching version to avoid unexpected results.

Steps:

  • Load required modules on the login node.
  • Do the compilation.
  • Open the job submission script and specify the same modules to be loaded as used while compilation.
  • Submit the script.

The directory contains a few sample programs and their sample job submission scripts. The compilation and execution instructions are described in the beginning of the respective files.

The user can copy the directory to his/her home directory and further try compiling and executing these sample codes. The command for copying is as follows:

cp -r /home/apps/Docs/samples/ ~/.
  • mm.c - Serial Version of Matrix-Matrix Multiplication of two NxN matrices
  • mm_omp.c - Basic OpenMP Version of Matrix-Matrix Multiplication of two NxN matrices
  • mm_mpi.c - Basic MPI Version of Matrix-Matrix Multiplication of two NxN matrices
  • mm_acc.c - OpenAcc Version of Matrix-Matrix Multiplication of two NxN matrices
  • mm_blas.cu - CUDA Matrix Multiplication program using the cuBlas library.
  • mm_mkl.c - MKL Matrix Multiplication program.
  • laplace_acc.c - OpenACC version of the basic stencil problem.

It is recommended to use the intel compilers since they are better optimized for the hardware.

Compilers

Compilers Description Versions Available
gcc/gfortran GNU Compiler (C/C++/Fortran) 8.5.0, 12.4.0, 13.3.0, 14.2.0
icc/icpc/ifort Intel Compilers (C/C++/Fortran) 2021.5.0, 2021.11.0, 2021.10.0, oneapi@2025.0.1, oneapi@2024.2.1
mpicc/mpicxx/mpif90 Intel-MPIwith GNU compilers (C/C++/Fortran) 2021.11.0
mpiicc/mpiicpc/mpiifort Intel-MPIwithIntel compilers (C/C++/Fortran) 2021.11.0
nvcc CUDA C Compiler 12.6.3

Optimization Flags

Optimization flags are meant for uniprocessor optimization, wherein, the compiler tries to optimize the program, on the basis of the level of optimization. The optimization flags may also change the precision of output produced from the executable. The optimization flags can be explored more on the respective compiler pages. A few examples are given below.

Intel:  -O3 –xHost
GNU: -O3
PGI: -fast

Given next is a brief description of compilation and execution of the various types of programs. However, for certain bigger applications, loading of additional dependency libraries might be required.

C Program:

Setting up of environment: 
spack load gcc@13.3.0 /3wdpf6l
spack load intel-oneapi-compilers@2024.0.0
compilation: icc -O3 -xHost <<prog_name.c>>
Execution: ./a.out

C + OpenMP Program:

Setting up of environment: 
spack load gcc@13.3.0 /3wdpf6l
spack load intel-oneapi-compilers@2024.2.1
compilation: icc -O3 -xHost <<prog_name.c>>
Execution: ./a.out

C + MPI Program:

Setting up of environment:  
spack load gcc@13.3.0 /3wdpf6l
spack load intel-oneapi-compilers@2024.0.0
Compilation: mpiicc -O3 -xHost <<prog_name.c>>
Execution: mpirun -n <<num_procs>> ./a.out

C + MKL Program:

Setting up of environment:
spack load gcc@13.3.0 /3wdpf6l
spack load intel-oneapi-compilers@2024.0.0
Compilation: icc -O3 -xHost -mkl <<prog_name.c>>
Execution: ./a.out

CUDA Program:

Setting up of environment: 
spack load gcc /3wdpf6l
spack load cuda /ezhcbzk

Example (1)
Compilation: nvcc -arch=sm_80<<prog_name.cu>>
Execution: ./a.out 


Note: The optimization switch -arch=sm_80 is intended for Ampere series A100 GPUs and is valid for CUDA 12 and later. Similarly, older versions of CUDA have compatibility with lower versions of GCC only. Accordingly, appropriate modules of GCC must be loaded. 




Example (2)
Compilation: nvcc -arch=sm_80 /home/apps/Docs/samples/mm_blas.cu -lcublas
Execution: ./a.out

CUDA + OpenMP Program:

Setting up of environment: 
spack load gcc@13 /3wdpf6l
spack load cuda /ezhcbzk


Example (1)
Compilation: nvcc -arch=sm_80 -Xcompiler="-fopenmp" -lgomp /home/apps/Docs/samples/mm_blas_omp.cu -lcublas
Execution: ./a.out 




Example (2)
Compilation: g++ -fopenmp /home/apps/Docs/samples/mm_blas_omp.c -I/home/apps/spack/opt/spack/linux-almalinux8-cascadelake/gcc-12.4.0/cuda-12.6.3-ezhcbzkhxdihcdrq6lp4df3stnmrza4b/include  -L/home/apps/spack/opt/spack/linux-almalinux8-cascadelake/gcc-12.4.0/cuda-12.6.3-ezhcbzkhxdihcdrq6lp4df3stnmrza4b/lib64 -lcublas
Execution: ./a.out

OpenACC Program:

Setting up of environment: 
spack load pgi@19.10 cuda@10.1

Compilation for GPU: pgcc -acc -fast -Minfo=all -ta=tesla:cc70,managed/home/apps/Docs/samples/laplace_acc.c
Execution:./a.out


Compilation for CPU: pgcc -acc -fast -Minfo=all -ta=multicore-tp=skylake /home/apps/Docs/samples/laplace_acc.c
Execution:./a.out.