SST/macro
|
As noted in the introduction, SST/macro is primarily intended to be an on-line simulator. Real application code runs, but SST/macro intercepts calls to communication (MPI) and computation functions to simulate time passing. However, SST/macro can also run off-line, replaying application traces collected from real production runs. This trace collection and trace replay library is called DUMPI.
Although DUMPI is automatically included as a subproject in the SST/macro download, trace collection can be easier if DUMPI is built independently from SST/macro. The code can be downloaded from https://bitbucket.org/sst-ca/dumpi. If downloaded through Mercurial, one must initialize the build system and create the configure script.
DUMPI must be built with an MPI compiler.
The –enable-libdumpi
flag is needed to configure the trace collection library. After compiling and installing, a libdumpi.$prefix
will be added to $DUMPI_PATH/lib
.
Collecting application traces requires only a trivial modification to the standard MPI build. Using the same compiler, simply add the DUMPI library path and library name to your project's LDFLAGS
.
DUMPI works by overriding weak symbols in the MPI library. In all MPI libraries, functions such as MPI_Send
are only weak symbol wrappers to the actual function PMPI_Send
. DUMPI overrides the weak symbols by implementing functions with the symbol MPI_Send
. If a linker encounters a weak symbol and regular symbol with the same name, it ignores the weak symbol. DUMPI functions look like
collecting profile information and then directly calling the PMPI functions.
We examine DUMPI using a very basic example program.
After compiling the program named test
with DUMPI, we run MPI in the standard way.
After running, there are now three new files in the directory.
DUMPI automatically assigns a unique name to the files from a timestamp. The first two files are the DUMPI binary files storing separate traces for MPI rank 0 and rank 1. The contents of the binary files can be displayed in human-readable form by running the dumpi2ascii
program, which should have been installed in $DUMPI_PATH/bin
.
This produces the output
The third file is just a small metadata file DUMPI used to configure trace replay.
To replay a trace in the simulator, a small modification is required to the example input file in SST/macro Parameter files. We have two choices for the trace replay. First, we can attempt to exactly replay the trace as it ran on the host machine. Second, we could replay the trace on a new machine or different layout.
For exact replay, the key issue is specifying the machine topology. For some architectures, topology information can be directly encoded into the trace. This is generally true on Blue Gene, but not Cray. When topology information is recorded, trace replay is much easier. The parameter file then becomes, e.g.
We have a new parameter launch_app1_type
set to dumpi
. This was implicit before, taking the default value of skeleton
. We also set indexing and allocation parameters to read from the DUMPI trace. The application name in launch_app1
is a special app that parses the DUMPI trace. Finally, we direct SST/macro to the DUMPI metafile produced when the trace was collected. To extract the topology information, locate the .bin
file corresponding to MPI rank 0. To print topology info, run
which produces the output
Here we see that the topology is 3D with extent 4,2,2 in the X,Y,Z directions. At present, the user must still specify the topology in the parameter file. Even though SST/macro can read the topology dimensions from the trace file, it cannot read the topology type. It could be a torus, dragonfly, or fat tree. The parameter file therefore needs
Beyond the topology, the user must also specify the machine model with bandwidth and latency parameters. Again, this is information that cannot be automatically encoded in the trace. It must be determined via small benchmarks like ping-pong. An example file can be found in the test suite in tests/test_configs/testdumpibgp.ini
.
If no topology info could be recorded in the trace, more work is needed. The only information recorded in the trace is the hostname of each MPI rank. The parameters are almost the same, but with allocation now set to hostname
. Since no topology info is contained in the trace, a hostname map must be put into a text file that maps a hostname to the topology coordinates. The new parameter file, for a fictional machine called deep thought
In this case, we assume a 2D torus with four nodes. Again, DUMPI records the hostname of each MPI rank during trace collection. In order to replay the trace, the mapping of hostname to coordinates must be given in a node map file, specified by the parameter launch_dumpi_mapname
. The node map file has the format
where the first line gives the number of nodes and number of coordinates, respectively. Each hostname and its topology coordinates must then be specified. More details on building hostname maps are given below.
We can also use the trace to experiment with new topologies to see performance changes. Suppose we want to test a crossbar topology.
We no longer use the DUMPI allocation and indexing. We also no longer require a hostname map. The trace is only used to generate MPI events and no topology or hostname data is used. The MPI ranks are mapped to physical nodes entirely independent of the trace.
Not all HPC machines support topology queries. The current scheme is only valid for Cray machines, which support topology queries via xtdb2proc
. NOTE: As of 01/15/2014, this command seems to be broken at NERSC. SST/macro comes with a script in the bin folder, xt2nodemap.pl
, that parses the Cray file into the DUMPI format. We first run
to generate a Cray-formatted file db.txt
. Next we run the conversion script
generating the hostname map.