Build and Installation of SST/macro

Downloading

SST/macro is available at http://bitbucket.org/sst-ca/sstmacro. You can get SST/macro in the following ways:

Download a .tar of a release on the downloads page \ (bitbucket.org/sst-ca/sstmacro/downloads)
Download a .tar of the repository on the main overview page
Clone the repository with Mercurial.

If you're using Mercurial, you can run the command:

1 $ hg clone http://bitbucket.org/sst-ca/sstmacro -r default

The -r default only downloads the current development branch and can be omitted if you want to bring in the entire history. The download can take a very long time on some systems to generate the "deltas" in the revision history. You can save yourself a lot of waiting by only downloading the default revision. If you're behind a firewall, make sure the http proxy is set in your ~/.hgrc file:

 [http_proxy]
 host=path-to-proxy:prox-port 
 [https_proxy] 
 host=path-to-proxy:prox-port 

If you'd like to use ssh for convenience, you'll have to modify your clone slightly by adding the "hg" username:

1 $ hg clone ssh://hg@bitbucket.org/sst-ca/sstmacro

and also add your public key to your bitbucket user account. Also, SST/macro uses subrepos, so for using ssh you should add the following to your ~/.hgrc

 [subpaths]
 https://bitbucket.org/sst-ca/dumpi = \
         ssh://hg@bitbucket.org/sst-ca/dumpi
 https://bitbucket.org/sst-ca/sstmacro-pth = \
         ssh://hg@bitbucket.org/sst-ca/sstmacro-pth

so that the http requests are converted to ssh.

Dependencies

A C/C++ compiler is required. gcc 4.2 and onward is known to work.
(optional, recommended) Qt libraries and build system (qmake) are needed to build the GUI input configuration tool. Qt 5.0 and above are suggested, although 4.9 has been observed in the wild to work (Section Building the GUI).
(optional) Mercurial is needed in order to clone the source code repository, but you can also download a tar (Section Downloading).
(optional, recommended) Autoconf and related tools are needed unless you are using an unmodified release or snapshot tar archive.
- Autoconf: 2.64 or later should work and 2.68 is known to work
- Automake: 1.11 or later should work and 1.11.1 is known to work
- Libtool: 2.2.6 or later should work and 2.4 is known to work
(optional) Doxygen and Graphviz are needed to build the documentation.
(optional) Graphviz is needed to collect call graphs.
(optional) VTK is needed for advanced vis features.
(optional) Python with the argparse module installed is required to run UPC skeletons. Python 2.7 and on should have this.

Configuration and Building

For a list of known compatible systems, see in the PDF manual.

Once SST/macro is extracted to a directory, we recommend the following as a baseline configuration, including building outside the source tree:

 sstmacro$ ./bootstrap.sh
 sstmacro$ mkdir build
 sstmacro$ cd build
 sstmacro/build$ ../configure --prefix=/path-to-install 

A complete list of options can be seen by running `../configure –help'. Some common options:

–enable-graphviz : Enables the collection of simulated call graphs, which can be viewed with graphviz.
–with-qt=$QMAKE: Direct SST/macro to the qmake executable. If no value is specified, it just assumes qmake is in your $PATH (see Section Building the GUI).
–enable-custom-new : Memory is allocated in larger chunks in the simulator, which can speed up large simulations.
–enable-fortran : Enable support for running fortran skeletons.
–enable-mpiparallel: Enable parallel discrete event simulation in distributed memory over MPI. See Section Parallel Simulations.

Once configuration has completed, printing a summary of the things it found, simply type `make'. It is recommended to use the `-j' option for a parallel build with as many cores as you have (otherwise it will take quite a while).

Post-Build

If the build did not succeed, check Known Issues for known issues, or contact SST/macro support for help (sstma.nosp@m.cro-.nosp@m.suppo.nosp@m.rt@g.nosp@m.oogle.nosp@m.grou.nosp@m.ps.co.nosp@m.m).

If the build was successful, it is recommended to run the range of tests to make sure nothing went wrong. To do this, and also install SST/macro to the install path specified during installation, run the following commands:

 sstmacro/build$ make -j8 check
 sstmacro/build$ sudo make install
 sstmacro/build$ export PATH=$PATH:/path-to-install
 sstmacro/build$ make -j8 installcheck

Make check runs all the tests we use for development, which checks all functionality of the simulator. Make installcheck compiles some of the skeletons that come with SST/macro, linking against the installation.

Important: After SST/macro is installed, add /path-to-install/bin to your PATH variable (we keep it in our .bashrc, .profile, etc). Applications and other code linking to SST/macro use Makefiles that use the sst++ compiler wrapper that is installed there for convenience to figure out where headers and libraries are. If you are building a skeleton (or running make installcheck) and you get errors along the lines of "can't find sst++", you probably forgot this step.

Known Issues

Any build or runtime problems should be reported to sstma.nosp@m.cro-.nosp@m.devel.nosp@m.@goo.nosp@m.glegr.nosp@m.oups.nosp@m..com. We try to respond as quickly as possible.
make -j: When doing a parallel compile dependency problems can occur. There are a lot of inter-related libraries and files. Sometimes the Makefile dependency tracker gets ahead of itself and you will get errors about missing libraries and header files. If this occurs, restart the compilation. If the error vanishes, it was a parallel dependency problem. If the error persists, then it's a real bug.
Ubuntu: The Ubuntu linker performs too much optimization on dynamically linked executables. Some call it a feature. I call it a bug. In the process it throws away symbols it actually needs later. This occurs when the executable depends on libA which depends on libB. The executable has no direct dependence on any symbols in libB. Even if you add -lB to the LDFLAGS or LDADD variables, the linker ignores them and throws the library out. It takes a dirty hack to force the linkage. If you get weird undefined reference errors, these can usually be removed by including the header file sstmac/force_link.h. If there are issues, contact the developers at sstma.nosp@m.cro-.nosp@m.devel.nosp@m.@goo.nosp@m.glegr.nosp@m.oups.nosp@m..com and report the problem. It can be fixed easily enough.
Compilation with clang should work, although the compiler is very sensitive. In particular, template code which is correct and compiles on several other platforms can mysteriously fail. Tread with caution.

Building DUMPI

By default, DUMPI is configured and built along with SST/macro with support for reading and parsing DUMPI traces, known as libundumpi. DUMPI binaries and libraries are also installed along with everything for SST/macro during make install. DUMPI can be used as it's own library within the SST/macro source tree by changing to sstmacro/dumpi, where you can change its configuration options. It is not recommended to disable libundumpi support, which wouldn't make much sense anyway.

DUMPI can also be used as stand-alone tool/library if you wish (e.g.~for simplicity if you're only tracing). To get DUMPI by itself, either copy the sstmacro/dumpi directory somewhere else or visit bitbucket.org/sst-ca/dumpi and follow similar instructions for obtaining SST/macro.

To see a list of configuration options for DUMPI, run `./configure –help'. If you're trying to configure DUMPI for trace collection, use `–enable-libdumpi'. Your build process might look like this (if you're building in a separate directory from the dumpi source tree) :

 dumpi/build$ ../configure --prefix=/path-to-install --enable-libdumpi
 dumpi/build$ make -j8
 dumpi/build$ sudo make install

Warning: It is possible that the configuration process for DUMPI can take a very long time on network file systems. It basically runs through every MPI function to check its availability/status on your system. If the MPI headers and libraries are not locally available (hard drive) or cached locally, then they will be brought in each time these tests are compiled and run from storage. If storage is a parallel file system, it might be slow.

Known Issues

When compiling on platforms with compiler/linker wrappers, e.g. ftn (Fortran) and CC (C++) compilers at NERSC, the libtool configuration can get corrupted. The linker flags automatically added by the wrapper produce bad values for the predeps/postdeps variable in the libtool script in the top level source folder. When this occurs, the (unfortunately) easiest way to fix this is to manually modify the libtool script. Search for predeps/postdeps and set the values to empty. This will clear all the erroneous linker flags. The compilation/linkage should still work since all necessary flags are set by the wrappers.

Building the GUI

The GUI depends on Qt 5.0 or greater. These can be easily downloaded from the Qt website. To configure SST/macro for compiling the GUI, an additional flag must be added:

1 sstmacro/build$ ../configure --prefix=/path-to-install --with-qt=$QMAKE

The variable $QMAKE must point to the qmake executable. If qmake is in $PATH, only `–with-qt' needs to be added. The GUI is compiled independently from SST/macro. In the build directory, just invoke:

1 sstmacro/build$ make gui

You will see the Qt compilation output followed by output from the source code parser. Keyword input to the GUI is automatically generated from the source code. Once parsing is complete, the GUI is ready to use. The executable is found in the qt-qui folder. On Mac, an application is generated, which can be run:

1 sstmacro/build$ open qt-gui/SSTMacro.app

On linux, a simple executable is generated.

Running an Application

To demonstrate how an application is run in SST/macro, we'll use a very simple send-recv program located in sstmacro/tutorials/sendrecv_c. We will take a closer look at the actual code in Section Basic MPI Program. After SST/macro has been installed and your PATH variable set correctly, run:

 sstmacro$ cd tutorials/sendrecv_c
 sstmacro/tutorials/sendrecv_c$ make
 sstmacro/tutorials/sendrecv_c$ ./runsstmac -f parameters.ini

You should see some output that tells you 1) the estimated total (simulated) runtime of the simulation, and 2) the wall-time that it took for the simulation to run. Both of these numbers should be small since it's a trivial program.

This is how simulations generally work in SST/macro: you build skeleton code and link it with the simulator to produce a binary. Then you run that binary and pass it a parameter file which describes the machine model to use.

Makefiles

We recommend structuring the Makefile for your project like the one seen in tutorials/sendrecv_c/Makefile :

 TARGET := runsstmac
 SRC := $(shell ls *.c) valid_keywords.cc force_link.cc
 
 # the sstmacro-config script must be found in PATH
 CXX :=      $(shell sstmacro-config --cxx )
 CC :=       $(shell sstmacro-config --cc )
 CXXFLAGS := $(shell sstmacro-config --cxxflags )
 CPPFLAGS := $(shell sstmacro-config --cppflags ) -I.
 LIBDIR :=   $(shell sstmacro-config --libdir )
 PREFIX :=   $(shell sstmacro-config --prefix )
 LDFLAGS :=  $(shell sstmacro-config --ldflags )  -Wl,-rpath,$(PREFIX)/lib
 ...

The sstmacro-config script is built by the SST/macro configuration process and installed into the bin folder. More linker and include flags can be added for different source trees. For advanced usage in projects built with automate and autoconf, the sstmacro-config script can be invoked in configure.ac following the usage above.

C vs. C++

The three `sendrecv' skeletons in sstmacro/tutorials show the different usage of C and C++ linking against SST/macro: C, C++ but with a C-style main, and a C++ class that inherits from sstmac::sw::mpiapp. Using C++ inheritance (such as in the sendrecv_cxx2 folder) will give you the most flexibility, including the ability to run more than one named application in a single simulation (see Section SST/macro Parameter files for more info).

Command-line arguments

There are only a few basic command-line arguments you'll ever need to use with SST/macro, listed below

-h - print some typical help info
-f [parameter file] - the parameter file to use for the simulation. This can be relative to the current directory, an absolute path, or the name of a pre-set file that is in sstmacro/configurations (which installs to /path-to-install/include/configurations, and gets searched along with current directory).
-d [debug flags] - flags that control simulator statistics and output. The general format is -d "<[category]> [flag]", where [category] can be debug, trace, or stats. For example, -d "<debug> mpi" prints out everything MPI is doing. Multiple flags can be appended with $|$, such as -d "<debug> mpi $|$ <trace> dumpi". See in the PDF manual1 for a complete list of flags and what they do.
-p [parameter]=[value] - setting a parameter value (overrides what is in the parameter file)
-t [value] - stop the simulation at simulated time [value]
-c - if debug output is enabled, color-code it by its source (very helpful when parsing lots of input). This uses ANSI escape sequences, so don't use this if you're producing any stats files, or redirecting output to file. Also, this is only really meant to be used on the Mac terminal.
-r [run number] - for a parameter file that is enabled for a parameter sweep, run only a specific parameter combination. See Section Parameter Sweeping with fork().

Parallel Simulations

SST/macro supports running a parallel discrete event simulation (PDES) in distributed memory over MPI. First, you must configure the simulator with the `–enable-mpiparallel' flag. Configure will check for MPI and ensure that you're using the standard MPI compilers. Your configure should look something like:

1 sstmacro/build $ ../configure --enable-mpiparallel CXX=mpicxx CC=mpicc ...

SST/macro also requires METIS for partitioning the workload amongst parallel processes, although in future versions this may no longer be a strict dependency. Make sure you have that installed gpmetis somewhere and the binary is in your PATH. METIS can be found at http://glaros.dtc.umn.edu/gkhome/metis/metis/download. SST/macro is run exactly like the serial version, but is spawned like any other MPI parallel program. Use your favorite MPI launcher to run, e.g. for OpenMPI

1 mysim $ mpirun -n 4 sstmac -f parameters.ini -d "<debug> mpicheck"

or for MPICH

1 mysim $ mpiexec -n 4 sstmac -f parameters.ini -d "<debug> mpicheck"

A few special configuration options are needed:

Set the environment variable SSTMAC_PARALLEL=mpi. By default, SST/macro assumes SSTMAC_PARALLEL=serial.
Add runtime = mpi to the input file. Again, by default, SST/macro assumes runtime = serial.
Add event_manager = clock_cycle_parallel. By default, SST/macro uses a serial event map. Currently, only one form of parallelism is supported - conservative, clock-cycle parallelism. More types of parallel event managers may be available in future versions, but this is the only currently supported one. If running on a single processor, clock_cycle_parallel is functionally equivalent to a serial event map.

Even if you compile for MPI parallelism, the code can still be run in parallel with the same configuration options. SST/macro will notice the total number of ranks is 1 and ignore any parallel options.

When launched with multiple MPI ranks, SST/macro will automatically figure out how many partitions (MPI processes) you are using, partition the network topology into contiguous blocks, and start running in parallel. If running an MPI program, you should probably be safe and use the `mpicheck' debug flag to ensure the simulation runs to completion. The mpicheck ensures MPI_Finalize is called and the simulation did not "deadlock.'' While the PDES implementation should be stable, it's best to treat it as Beta++ to ensure program correctness.

Parallel simulation may not speed up SST/macro for certain test cases. Most events are scheduled farther into the future than link (synchronization) latency. Since we use the conservative null-message technique, there will be a lot of overhead in synchronizing the clocks. Parallel is most likely to be useful because of memory constraints, expanding the maximum memory footprint. For simulations with serious congestion or heavy interconnect traffic, you may observe speedups, but they will be far from ideal or linear given the synchronization overheads.