Generating Profiling Datasets
HPCToolkit
HPCToolkit can be installed using Spack or manually. Instructions to build HPCToolkit manually can be found at http://hpctoolkit.org/software-instructions.html.
You can see a basic example of how to use HPCToolkit and generate performance data below.
$ mpirun -np <num_ranks> hpcrun <hpcrun_args> ./program.exe <program_args>
This command generates a “measurements” directory. Hatchet cannot read
this natively and requires another step to generate a “database” directory
using hpcprof-mpi
as described below.
$ hpcstruct ./program.exe
$ mpirun -np 1 hpcprof-mpi --metric-db=yes -S ./program.exe.struct -I <path_to_src> <measurements-directory>
The first command generates a struct file for the executable program.exe
.
This is provided as one of the arguments in the second command along with
pointers to the source code and the generated measurements directory. You must
add the --metric-db=yes
option to hpcprof-mpi
to generate the database
directory in the format recognizable by hatchet.
You can specify the events you want to record as arguments to hpcrun
. For
example: -e CPUTIME@5000
or -e PAPI_TOT_CYC@5000000 -e PAPI_TOT_INS -e
PAPI_L2_TCM -e PAPI_BR_INS
.
If you want to record data only for the main thread (0) and not for other
helper threads, you can set this environment variable: export
HPCRUN_IGNORE_THREAD=1,2,..
.
More information information about HPCToolkit can be found at HPCToolkit’s documentation page.
Caliper
Caliper can be installed using Spack or manually from its GitHub repository. Instructions to build Caliper manually can be found in its documentation.
To record performance profiles using Caliper, you need to include cali.h
and call the cali_init()
function in your source code. You also need to
link the Caliper library in your executable or load it using LD_PRELOAD
.
Information about basic Caliper usage can be found in the Caliper
documentation.
To generate profiling data, you can use Caliper’s built-in profiling
configurations customized for Hatchet: hatchet-region-profile
or
hatchet-sample-profile
. The former generates a profile based on user
annotations in the code while the latter generates a call path profile (similar
to HPCToolkit’s output). If you want to use one of the built-in
configurations, you should set the CALI_CONFIG
environment variable (e.g.
CALI_CONFIG=hatchet-sample-profile
).
Alternatively, you can use a custom Caliper .config file (default: caliper.config). If you create your own .config file, you can set the CALI_CONFIG_FILE environment variable to point to it. Two sample caliper.config files are presented below. Other example configuration files can be found in the Caliper GitHub repository.
CALI_SERVICES_ENABLE=aggregate,event,mpi,mpireport,timestamp
CALI_EVENT_TRIGGER=annotation,function,loop,mpi.function
CALI_TIMER_SNAPSHOT_DURATION=true
CALI_AGGREGATE_KEY=prop:nested,mpi.rank
CALI_MPI_WHITELIST=MPI_Send,MPI_Recv,MPI_Isend,MPI_Irecv,MPI_Wait,MPI_Waitall,MPI_Bcast,MPI_Reduce,MPI_Allreduce,MPI_Barrier
CALI_MPIREPORT_CONFIG="SELECT annotation,function,loop,mpi.function,mpi.rank,sum(sum#time.duration),inclusive_sum(sum#time.duration) group by mpi.rank,prop:nested format json-split"
CALI_MPIREPORT_FILENAME="lulesh-annotation-profile.json"
CALI_SERVICES_ENABLE=aggregate,callpath,mpi,mpireport,sampler,symbollookup,timestamp
CALI_SYMBOLLOOKUP_LOOKUP_MODULE=true
CALI_TIMER_SNAPSHOT_DURATION=true
CALI_CALIPER_FLUSH_ON_EXIT=false
CALI_SAMPLER_FREQUENCY=200
CALI_CALLPATH_SKIP_FRAMES=4
CALI_AGGREGATE_KEY=callpath.address,cali.sampler.pc,mpi.rank
CALI_MPIREPORT_CONFIG="select source.function#callpath.address,sourceloc#cali.sampler.pc,mpi.rank,sum(sum#time.duration),sum(count),module#cali.sampler.pc group by source.function#callpath.address,sourceloc#cali.sampler.pc,mpi.rank,module#cali.sampler.pc format json-split"
CALI_MPIREPORT_FILENAME="cpi-sample-callpathprofile.json"
You can read more about Caliper services in the Caliper documentation. Hatchet can read two Caliper outputs: the native .cali files and the split-JSON format (.json files).
TAU
TAU can be installed using Spack or manually via instructions in its install guide.
You can instrument and/or sample your program using TAU. To instrument your
program, you can compile it with tau_cc.sh
or tau_cxx.sh
like any other
compiler. To sample your program, you can run it with tau_exec
.
Below, you can find the required environment variables to sample your program
and get call path data using TAU. You can both instrument and sample your
program using
these environment variables and tau_exec
after compiling your program with tau_cc/cxx.sh
.
TAU_PROFILE=1
TAU_CALLPATH=1
TAU_SAMPLING=1
TAU_CALLPATH_DEPTH=100
TAU_EBS_UNWIND=1
(optional) TAU_METRICS=<TAU/PAPI_metrics>
(optional) PROFILEDIR=<directore_name_for_profile_data>
After setting these environment variables, you can run your program as:
$ mpirun -np <num_ranks> tau_exec -T mpi,openmp -ebs ./program.exe <program_args>
More information about using TAU can be found in its user guide.
timemory
Timemory can be installed using Spack or manually as suggested in its documentation.
Timemory can perform both runtime instrumentation and binary rewriting, but recommends using binary rewriting for distributed memory parallelism. To use binary rewriting, you need to first generate an instrumented executable and then run that instrumented executable as below.
$ timemory-run <timemory-run_options> -o <instrumented_executable> --mpi -- <executable>
$ mpirun -np <num_ranks> ./<instrumented_executable>
More information about how to use timemory can be found at https://timemory.readthedocs.io/en/develop/index.html.
pyinstrument
Hatchet can read pyinstrument JSON files which can be generated
by using its Python API or using the command line:
Command line
$ pyinstrument -r json -o <output.json> ./program.py
Python API
from pyinstrument import Profiler
from pyinstrument.renderers import JSONRenderer
profiler = Profiler()
profiler.start()
# do some work
profiler.stop()
print(JSONRenderer().render(profiler.last_session))