Time profiling with nhc98
Time profiling allows a programmer to see which of the functions in
their program consume the most time, and how many times each is
called. The information is collected as the program is executing, so
it is simple to try the analysis with various types of input data,
viewing behaviour in "average" and "worst" cases.
The profiles produced can be used to compare two different
implementations, or can direct programmers to "hot spots" in a program
that use a disproportionate amount of resources. The effort to
improve the program can then be focussed on these critical pieces,
where it will be of most benefit. The profile may also highlight
functions that are being called more or less often than a programmer
expects, indicating possible bugs that have been overlooked.
To obtain a time profile of a computation:
-
Compile all modules of the program with the -t option;
also specify -t at link-time. Using
hmake -t does all the necessary compiling and
linking automatically. eg. to compile MyProg.hs and
everything it depends on for time profiling simply use
hmake -t MyProg
-
Run the program with the additional runtime argument
-t eg. ./MyProg +RTS -t -RTS
will output profiling data to the file MyProg.tp
Time profiling options
-t
|
turn on time profiling
|
-t-Module
|
show only the total for all functions in Module
|
-t+Module
|
show all functions separately in Module
|
-t+all
|
show all functions separately in all modules
|
-t-all
|
show only total-per-module for all modules
|
-tt
|
order results by time consumed per function
|
-tc
|
order results by number of calls per function
|
-tm
|
group results by module
|
-isize
|
set the profiling interval to size
|
|
(without unit in number of word allocations on heap
, with s or ms in seconds or milliseconds; default 100ms)
|
The various single-letter flags can be aggregated together, e.g.
-tmt which will order the functions by time within each
module - this is the default if just -t is supplied. You can
give a list of module names separated by spaces using the shell
quoting mechanism eg. -t"+Prelude -Main +IO" The
default is to show all functions written by the user separately but
only show the total for library modules (Prelude, IO, etc.).
Example
The program used in this example is taken from the nofib
Benchmark Suite, and uses Peanno arithmetic to raise 3 to the
power 8.
data Peanno = Z | S Peanno
add Z y = y
add (S x) y = S (add x y)
mult x Z = Z
mult x (S y) = add (mult x y) x
raise x Z = S Z
raise x (S y) = mult x (raise x y)
intTOpeanno x = if x < 1 then Z else S (intTOpeanno (x-1))
peannoTOint Z = 0
peannoTOint (S x) = 1 + peannoTOint x
main = print (peannoTOint (raise (intTOpeanno 3) (intTOpeanno 8)))
The resulting profile is given below. The listing, by module, gives
the percentage of time consumed, and the number of calls, per function.
The profile shows that most of the execution time (>75%) is spent
in the add function, and that there are far more calls made
to this function than any other.
JOB ./Peanno +RTS -t -RTS ;
----------------------------------------------------------
Main 76.0% time 8079493 calls
----------------------------------------------------------
% time # calls
76.0 8069620 add
0.0 6562 peannoTOint
0.0 3288 mult
0.0 13 intTOpeanno
0.0 9 raise
0.0 1 main
Prelude 0.5% time 39752 calls
GarbageCollect 23.5% time 416 calls
Time figures based on 208 samples
Why are there so many applications of the add function?
From the recursive definition, the evaluation of
add x y leads to x more applications of
add. The evaluation of mult x y leads to
y nested applications of add in the form
(add (add (add x x) x) ... ).
The evaluation of the innermost of these leads to just x more
applications of add, but then the evaluation of the next
leads to twice as many, and so on.
By altering the definition of mult to
add x (mult x y), the nested applications
of add are in the form
(add ... (add x (add x x))),
and so each gives rise to just x add functions. This change
decreased the number of calls for add from 8069620 to 13120,
and reduced the program's runtime by a factor of about 100.
The latest updates to these pages are available on the WWW from
http://www.cs.york.ac.uk/fp/nhc98/
York Functional Programming Group
|