Time profiling with nhc98

Time profiling allows a programmer to see which of the functions in their program consume the most time, and how many times each is called. The information is collected as the program is executing, so it is simple to try the analysis with various types of input data, viewing behaviour in "average" and "worst" cases.

The profiles produced can be used to compare two different implementations, or can direct programmers to "hot spots" in a program that use a disproportionate amount of resources. The effort to improve the program can then be focussed on these critical pieces, where it will be of most benefit. The profile may also highlight functions that are being called more or less often than a programmer expects, indicating possible bugs that have been overlooked.

To obtain a time profile of a computation:

Compile all modules of the program with the -t option; also specify -t at link-time. Using hmake -t does all the necessary compiling and linking automatically. eg. to compile MyProg.hs and everything it depends on for time profiling simply use hmake -t MyProg
Run the program with the additional runtime argument -t eg. ./MyProg +RTS -t -RTS will output profiling data to the file MyProg.tp

Time profiling options

`-t`	turn on time profiling
`-t-Module`	show only the total for all functions in `Module`
`-t+Module`	show all functions separately in `Module`
`-t+all`	show all functions separately in all modules
`-t-all`	show only total-per-module for all modules
`-tt`	order results by time consumed per function
`-tc`	order results by number of calls per function
`-tm`	group results by module
`-isize`	set the profiling interval to `size`
	(without unit in number of word allocations on heap , with s or ms in seconds or milliseconds; default 100ms)

The various single-letter flags can be aggregated together, e.g. -tmt which will order the functions by time within each module - this is the default if just -t is supplied. You can give a list of module names separated by spaces using the shell quoting mechanism eg. -t"+Prelude -Main +IO" The default is to show all functions written by the user separately but only show the total for library modules (Prelude, IO, etc.).

Example

The program used in this example is taken from the nofib Benchmark Suite, and uses Peanno arithmetic to raise 3 to the power 8.

data Peanno = Z | S Peanno

add Z     y = y
add (S x) y = S (add x y)

mult x Z     = Z
mult x (S y) = add (mult x y) x

raise x Z     = S Z
raise x (S y) = mult x (raise x y)

intTOpeanno x = if x < 1 then Z else S (intTOpeanno (x-1))

peannoTOint Z     = 0
peannoTOint (S x) = 1 + peannoTOint x

main = print (peannoTOint (raise (intTOpeanno 3) (intTOpeanno 8)))

The resulting profile is given below. The listing, by module, gives the percentage of time consumed, and the number of calls, per function. The profile shows that most of the execution time (>75%) is spent in the add function, and that there are far more calls made to this function than any other.

JOB ./Peanno +RTS -t -RTS ;

----------------------------------------------------------
Main		  76.0% time	   8079493 calls
----------------------------------------------------------
 % time     # calls                                       
   76.0     8069620   add
    0.0        6562   peannoTOint
    0.0        3288   mult
    0.0          13   intTOpeanno
    0.0           9   raise
    0.0           1   main

Prelude 	   0.5% time	     39752 calls

GarbageCollect	  23.5% time	       416 calls

Time figures based on 208 samples

Why are there so many applications of the add function?

From the recursive definition, the evaluation of add x y leads to x more applications of add. The evaluation of mult x y leads to y nested applications of add in the form (add (add (add x x) x) ... ). The evaluation of the innermost of these leads to just x more applications of add, but then the evaluation of the next leads to twice as many, and so on.

By altering the definition of mult to add x (mult x y), the nested applications of add are in the form (add ... (add x (add x x))), and so each gives rise to just x add functions. This change decreased the number of calls for add from 8069620 to 13120, and reduced the program's runtime by a factor of about 100.

The latest updates to these pages are available on the WWW from http://www.cs.york.ac.uk/fp/nhc98/

York Functional Programming Group