NMI Build Notes
NMI Host: nmi-s003.cs.wisc.edu (production), nmi-s005.cs.wisc.edu (dev)
user: mbjones
Important Note: Any and all of the script files that are checked out from SVN must have the exe bit set. You can use 'svn propset svn:exectuable true <file>' to set it in the repository. If you don't set the bit, NMI will tell you the file is "not found."
Bugs: See bug 3238 for a list of what has been done with the refactoring of the NMI build since the modularization of Kepler.
Quick Overview
The NMI machines are divided into two classes. The submit hosts and the execute hosts. The submit host is what you login to via ssh or view your job status on the website (like nmi-s003). The execute hosts are the platform specific hosts where builds actually take place. You submit your jobs with the 'nmi_submit' command followed by a submit file.
The submit file consists of a set of parameters telling the submit host where to get your source, what pre and post hook scripts to fire, what execute host operating systems you'd like to use and what task scripts you want to execute once the files are checked out onto the execute host. You also define any prerequisite software, such as Ant, that you need for your jobs on each execute host. The details of the submit file can be found here.
Here is the workflow for the Kepler suite build:
- checkout all modules of the kepler suite as well as ptolemy
- before execution, run the pre_all.sh script which moves the checked out modules into kepler-run/modules. Note that the build system cannot do checkouts on the execution hosts.
- once the checkout is complete and the pre_all step has finished, run the build.sh script to do the actual build. The build script will change into the build-area directory, run the change-to command, run the compile command and finally run the test command.
- remote_post after each remote_task. This step extracts any deliverables (i.e. the created installers) and tars them up so they can be extracted on the submit host for transfer to kepler-project.org.
- post_all.sh script is executed in the post_all hook. The post_all.sh script does two main things. It concatenates all of the error streams together into a file called "notify.nmi." This file can be used to append the notify email sent out after the build completes. Unfortunately, this is not yet implemented on the nmi-s003 host but once it is, our system should work with it. The second thing it does is tar up the entire kepler-run directory so that it can be uploaded to our server for nightly snapshot downloads. It also gets the installers tarred up by the remote_post task and readies them for uploading to the dist/nightly directory on kepler-project.org.
- After post_all, the output stage begins. The .out file transfers the installers and nightly snapshot to the kepler-project.org server via SSH.
- Finally, the notify step is executed. The notify email address is set in the submit file. Right now it's going to kepler-nightly@kepler-project.org.
NMI Ant Task
The Kepler build system includes a task to create the NMI configuration files for any given suite within Kepler. Use the following steps to build an NMI run for your suite:
- from build-area, run 'ant nmi -Dsuite=yourSuite'. A 'yourSuite' directory will be created in build-area/resources/nmi/runs with all of the NMI configuration files necessary.
- Check the build-area/resources/nmi/runs/yourSuite directory into SVN.
- SSH to the NMI submit host (usually nmi-s003.cs.wisc.edu).
- SVN checkout the yourSuite directory into the ~mbjones directory
- Change into the yourSuite directory.
- Run the install.sh script there. You will see symlinks being created to all of the .svn, .out and *-submit files. If there are name conflicts, the script will resolve them.
- Change back to ~mbjones.
- Run 'nmi_submit yourSuite-submit'
- Go here to see your run in progress.
This task greatly reduces the amount of work it takes to get an NMI build running. Any customizations can be done in the yourSuite directory, however, if 'ant nmi -Dsuite=yourSuite' is run again, your customizations will be overwritten.
All of the attributes of the submit file (prereqs, execution hosts, etc) can be changed with parameters of the nmi task in the build. Here's an example of the task definition:
<nmi project="kepler" component="kepler" componentVersion="${suite}-trunk" description="Kepler NMI build for the suite ${suite}" runType="build" platforms="x86_macos_10.4, x86_fc_5" prereqs="prereqs:apache-ant-1.7.0, prereqs_x86_fc_5:java-1.5.0_08, prereqs_x86_macos_10.4:java-1.5.0_06" notify="kepler-nightly@kepler-project.org" overwrite="true"/>
Daily Job Submission
To edit the crontab for the automatic runs, use the command 'crontab -e'. Right now, the automatic build runs at 10 am and 10 pm everyday.
Job Deletion
The command 'nmi_rm <runid>' will delete a job from the queue.
Additional Notes
A good command to know is:
condor_status -format '%s\n' nmi_platform | sort | uniq
which will give you the current list of platforms that can be used in your "platforms" specification in the submit file.
Another good command is:
nmi_list_prereqs --platform=x86_winnt_5.1
You can change the platform to whichever platform you want to see the installed prereqs for.
If you run into any problems where it seems the build is taking a really long time or may be hung, you can ssh into the execute host to look at the current output. Todd from NMI told me that they don't recommend this, but that it can be essential for debugging. To find the execute host, look at the Run Details page for your run. Where it says that your job is running toward the bottom of the table, there will be a host name that it is running on. You can ssh to that host from nmi-s00X. Change to the /home/condor/execute directory and you'll see a directory called 'dir_XXXX'. Go into that directory and you'll find the logs for your current run.
Note that because you cannot use the network on the execute hosts, except through the nmi_ commands, any attempt to access the network will hang the run with no error. Because of the way our build system does checkouts, this is incredibly error prone. If builds are hanging, it is probably because of this. Any SVN or SSHing needs to be done with appropriate nmi command files.