fncdump
This manual describes the profiler called fncdump , and how you
can use it to determine which parts of a program are taking most of the
execution time. We assume that you know how to write, compile, and
execute programs. fncdump was written by Yannick Perret.
Copyright (C) 2001 Yannick Perret.
Permission is granted to make and distribute verbatim copies of
this manual provided the copyright notice and this permission notice
are preserved on all copies.
Permission is granted to copy and distribute modified versions of this
manual under the conditions for verbatim copying, provided that the entire
resulting derived work is distributed under the terms of a permission
notice identical to this one.
Permission is granted to copy and distribute translations of this manual
into another language, under the same conditions as for modified versions.
Profiling allows you to learn where your program spent its time and which
functions called which other functions while it was executing. This
information can show you which pieces of your program are slower than you
expected, and might be candidates for rewriting to make your program
execute faster. It can also tell you which functions are being called more
or less often than you expected. This may help you spot bugs that had
otherwise been unnoticed.
Since the profiler uses information collected during the actual execution
of your program, it can be used on programs that are too large or too
complex to analyze by reading the source. However, how your program is run
will affect the information that shows up in the profile data. If you
don't use some feature of your program while it is being profiled, no
profile information will be generated for that feature.
fncdump profiler is divided into two part. The first one is
the fnccheck library, which needs to be linked with the program
to be profiled. The second part is the fncdump program itself
which extract and display profile information for your program.
Profiling has several steps:
The next three chapters explain these steps in greater detail.
Several forms of output are available from the analysis.
The flat profile shows how much time your program spent in each function,
and how many times that function was called. If you simply want to know
which functions burn most of the cycles, it is stated concisely here.
See section The Flat Profile.
The call graph shows, for each function, which functions called it
or which other functions it called. See section The Call Graph.
To better understand how profiling works, you may wish to read
a description of its implementation.
See section Implementation of Profiling.
The first step in generating profile information for your program is
to compile and link it with profiling enabled. Remember that fncdump
can only be used with gcc V2.95.2 or higher.
To compile a source file for profiling, specify the
`-finstrument-functions -g' options when
you run the compiler (This is in addition to the options you normally
use).
To link the program for profiling, add the `fnccheck' library
to your link command (`-lfnccheck'). Here are examples:
gcc -finstrument-functions -g -c myprog.c utils.c
gcc -o myprog myprog.o utils.o -lfnccheck
The `-finstrument-function -g' options also works with a command
that both compiles and links:
gcc -o myprog myprog.c utils.c -finstrument-functions -g -lfnccheck
If you compile only some of the modules of the program with
`-finstrument-function -g', you
can still profile the program, but you won't get complete information about
the modules that were compiled without `-finstrument-function -g'.
The only information you get is the time spend in them, seen as local
time for the calling functions.
Once the program is compiled for profiling, you must run it in order to
generate the information that fncdump needs. Simply run the program
as usual, using the normal arguments, file names, etc. The program should
run normally, producing the same output as usual. It will, however, run
somewhat slower than normal because of the time spent collecting and the
writing the profile data.
The way you run the program--the arguments and input that you give
it--may have a dramatic effect on what the profile information shows. The
profile data will describe the parts of the program that were activated for
the particular input you use. For example, if the first command you give
to your program is to quit, the profile data will show the time used in
initialization and in cleanup, but not much else.
Your program will write the profile data into a file called `fnccheck.out'
just before exiting. If there is already a file called `fnccheck.out',
its contents are overwritten. You can specify an other file name
by setting the shell variable `FNCCHK_OUTFILE' with the needed file name.
If your program uses fork() function, two processes will have to
dump their data. In this case, the main process will dump into
`fnccheck.out' or whatever name you set, and the child(s) process(es)
will dump into `fnccheck.out_PID_OF_CHILD'. You can prevent childs
to dump by using the shell variable `FNCCHK_NOCHILD'.
The `fnccheck.out' is written when returning from main or by
calling exit . Some signals are catched to perform file writing
on kills, at the condition that the program don't redirect them.
The `fnccheck.out' file is written in the program's current working
directory at the time it exits. This means that if your program calls
chdir , the `fnccheck.out' file will be left in the last directory
your program chdir 'd to. If you don't have permission to write in
this directory, the file is not written, and you will get an error message.
Several shell variables can be used to change profile behavior:
-
`FNCCHK_STOP': if set, all profiling treatments are deactivated.
It is useful to run a compiled program in a normal way.
-
`FNCCHK_OUTFILE': if set, replaces the profile data file name
(default is `./fnccheck.out').
-
`FNCCHK_NOCHILD': if set, childs of the process are not dumped
to file at all.
-
`FNCCHK_STACKSIZE': set the default stack size used in the profiler.
Default value is 1000000.
-
`FNCCHK_NBFUNC': set the default function table size used in the
profiler. Default value is 1024.
-
`FNCCHK_DYNAMIC': the profiler default behavior is to generate an
error if stack is full or if the maximum number of functions is reached.
If `FNCCHK_DYNAMIC' is set, reallocation of these tables is
performed during execution, but it generates bias in time computation.
-
`FNCCHK_DEBUG': ORed values of debug options. Current values are:
2: display collisions for the hash table at the end.
4: details on each fnccheck functions.
8: details on treatments.
Note: you have to compile the library with debug activated.
After you have a profile data file `fnccheck.out', you can run
fncdump
to interpret the information in it. The fncdump program prints a
flat profile and a call graph on standard output. Typically you would
redirect the output of fncdump into a file with `>'.
You run fncdump like this:
fncdump [options] executable-file [> outfile]
Here square-brackets indicate optional arguments.
These options specify which of several output formats
fncdump should produce.
-avg
-
-average
-
The `-avg' option causes
fncdump to display an estimated
over-cost time of fnccheck library, expressed in time per function.
-sfile <file>
-
If the `-sfile' option is given,
fncdump uses `<file>'
as profile data file instead of default `./fnccheck.out'.
-sort <n> / +sort <n>
-
The `-sort' specify how `fncdump' sorts the flat profile.
Available sorts are: 1: sorted by local time. 2: sorted by total time
(default). 3: sorted by number of calls. 4: sorted by functions name.
5: not sorted (execution order).
The `+sort' option reverse the sort order.
-calls / +calls
-
This option causes
fncdump to display calls list of each
function, instead of displaying list of callers of each functions.
The `+calls' option display both the two lists.
-no-minmax
-
This option causes
fncdump not to display MIN and MAX time
list. This list shows for each function the MIN and MAX total time
spend in it.
-func-details
-
The `-func-details' option causes
fncdump to print
a list of each function, with its corresponding file and line number.
This option can't be used with `-nm' option.
-call-details / -calls-details
-
The `-call-details' option causes
fncdump to
display file and line number of callers site in calls list.
This option can't be used with `-nm' option.
-fullname
-
Used with `-func-details' or `-call-details', causes
fncdump to print full pathnames for files instead of
basenames.
-call-graph
-
This option causes
fncdump to output the call-graph of
the profiled program, using VCG format (a GPL call-graph
displayer). No other information is then displayed.
-cycles
-
This option causes
fncdump to add the detected cycles
during execution. Two kind of cycles can be displayed. Simple
recursions (a function calls itself), and cycles, in which case
the begin and the end of the cycle are displayed.
-sum-profile 'f1.out' 'f2.out' 'fr.out'
-
This option causes
fncdump to add the profile data files
`f1.out' and `f2.out' contents, and to generates a
new profile data file called `fr.out'. If `fr.out' is
`-', `stdout' is used. If you sum files from different
programs or different compilations, the new profile data file
will have very few sence. Note: you can use the same name for
`fr.out' as for `f1.out' or `f2.out' (overwrite
is possible).
These options control various aspects of the `fncdump' behavior.
-nm
-
This option indicates to `fncdump' to use
nm program
to extract function names from symbol addresses, instead of
using library bfd from binutils. In this case, file and
line number of functions are not available.
-addr2line
-
This option causes `fncdump' to use
addr2line program
to extract function names from symbol addresses, instead of
using library bfd from binutils.
Note: if library bfd is not available on your system,
`fncdump' can be compiled without using it. In this case,
default method is addr2line .
--help / -help / --h / -h
-
Causes `fncdump' to display a help message, with options
list .
--version /-version / --v / -v
-
Causes `fncdump' to display its version number.
--misc
-
Causes `fncdump' to display author, contact and bug report
information.
--details
-
Cause `fncdump' to display explanations on available options,
output contents and data profile file.
These options specify how to set or remove functions from
fncdump display.
-only <flist>
-
The `-only' option set which functions are to be displayed
in the
fncdump functions list. `<flist>' is a comma
separated list of function names. Only these functions will be
used to generate profile, call-graph...
-not <flist>
-
The `-not' option set which functions are to be removed
from the
fncdump function list. `<flist>' is a comma
separated list of function names. All these functions will be
removed from functions list to generate profile, call-graph...
-propagate
-
This option indicates that functions set or removed (using
`-only' or `-not' options) are propagated to childs.
It means that if a function
f() calls a function g() ,
using options `-only f -propagate' will make both `f()'
and `g()' to be used in functions list.
In the same way, using options `-not f -propagate' will
make both `f()' and `g()' to be removed from functions list.
-rpropagate
-
This option indicates that functions set or removed (using
`-only' or `-not' options) are propagated to callers.
It means that if a function
g() is called by a function f() ,
using options `-only g -rpropagate' will make both `f()'
and `g()' to be used in functions list.
In the same way, using options `-not g -rpropagate' will
make both `f()' and `g()' to be removed from functions list.
-no-spontaneous
-
The `-no-spontaneous' option causes
fncdump to not display
functions that seem to be called spontaneously. This can happen if you
profile functions that are called by non-profiled functions.
Note: this option not apply to main() function.
-no-unresolved
-
The `-no-unresolved' option causes
fncdump to not display
functions for which names are not find in the executable. This can
happen if you give an invalid executable name to `fncdump', or
if you profile functions which are loaded at execution time (using
dlopen(...) from dlfcn.h . In this case the functions
addresses are not available in executable. The default behavior of
`fncdump' is to replace unresolved function names by their
symbol address (i.e. `<0x8049178>').
-real-maxtime
-
The `-real-time' option causes
fncdump to compute
total execution time using final functions list (after `-not'
or `-only' options applied). The total execution time is used
to compute time percentages in flat profile display.
WARNING: this option is desactivated in version 1.2, due to changes
in memory representation of call-graph. It will be activated again
in future release.
fncdump first displays the cycles detected during execution,
if the option `-cycles' is activated.
fncdump then displays various informations
about the profiled program.
the executable name
-
This is the name of the profiled program.
Total execution time
-
Most of the time, this time is the total execution time
of
main() function. It can be different if main()
is not in a profiled object, or if main() has been removed
from functions list using `-not' or `-only' options.
In this case the execution time is followed by the message
`(not main() time)'.
The clock type used
-
Indicates which kind of clock was used to compute time. Possible
values are `real clock time', which indicates that the
'user' clock is used; the second possible value is `system time',
which indicates that CPU process time is used.
number of realloc
-
Indicates how many reallocations where performed during profile.
This number is always zero if `FNCCHK_DYNAMIC' is not set.
If this number is not zero, bias can be present in times
(but it is not possible to know where the bias is).
Then the flat profile is displayed (see below).
After the flat profile, some other informations are displayed:
number of unresolved functions not shown
-
It is the number of hidden functions due to option `-no-unresolved'.
number of `-not/-only' functions not shown
-
It is the number of hidden functions due to options `-not'
and `-only'.
final stack size
-
It is the effective stack size used during execution.
number of functions
-
It is the total number of functions.
Then the MIN/MAX time table is displayed. For each function, its
max and min total execution time is displayed. This table is not
displayed if option `-no-minmax' is used.
The final data displayed is the call-graph (see below).
The flat profile gives for each function various informations,
such as local time spend in the function, total execution time,
number of calls...
Here is a example of flat profile on a small program:
| local | total | | |
| sec. | % | sec. | % | calls |t. sec/call| name
|-----------|-----|-----------|-----|----------|-----------|--------
| 0.068976| 0.3| 23.331874|100.0| 1| 23.331874| main
| 5.056025| 21.7| 10.078856| 43.2| 4| 2.519714| recurs_a
| 5.022831| 21.5| 8.822895| 37.8| 4| 2.205724| recurs_b
| 4.037846| 17.3| 8.077933| 34.6| 4| 2.019483| recurs_1s
| 5.105893| 21.9| 5.105893| 21.9| 1| 5.105893| test
| 4.040068| 17.3| 4.040068| 17.3| 4| 1.010017| s1
| 0.000097| 0.0| 0.000180| 0.0| 32| 0.000006| recurs
| 0.000020| 0.0| 0.000030| 0.0| 1| 0.000030| f3
| 0.000010| 0.0| 0.000010| 0.0| 3| 0.000003| f1
| 0.000006| 0.0| 0.000006| 0.0| 1| 0.000006| small
The first two columns display the local time spend in each
function. Local time is the total execution time of the function
minus the time spend in (profiled) child functions. The
second column is the percentage of this time regards to the
total execution time.
The next two columns display the total time spend in each
function. The
second column is the percentage of this time regards to the
total execution time.
The fifth column shows the number of calls during execution
for each function.
The next column indicate the average time per call, computed as
the total execution time divided by the number of calls.
The last column displays the corresponding function name.
Call-graph shows for each function the list of its callers, or
its childs if using option `-calls'.
Here is a example of call-graph for a small program:
'main' [0] spontaneously called.
'recurs_a' [1] called by:
[0],
[2],
'recurs_b' [2] called by:
[1],
'recurs_1s' [3] called by:
[0],
[3],
'test' [4] called by:
[0],
's1' [5] called by:
[3],
'recurs' [6] called by:
[0],
[6],
'f3' [7] called by:
[0],
'f1' [8] called by:
[7],
'small' [9] called by:
[0],
If option `-calls' is used, callers are replaced by
list of child functions. For the same example it becomes:
'main' [0] calls:
[1],
[3],
[4],
[6],
[7],
[9],
'recurs_a' [1] calls:
[2],
'recurs_b' [2] calls:
[1],
'recurs_1s' [3] calls:
[3],
[5],
'test' [4] calls: nobody.
's1' [5] calls: nobody.
'recurs' [6] calls:
[6],
'f3' [7] calls:
[8],
'f1' [8] calls: nobody.
'small' [9] calls: nobody.
If option `-call-details' is used, file and line number of
each function and each caller is added to output. Here is an
example for the same program:
'main' [0] (essai.c:128) spontaneously called.
'recurs_a' [1] (essai.c:68) called by:
[0](essai.c:132),
[2](essai.c:94),
'recurs_b' [2] (essai.c:82) called by:
[1](essai.c:78),
'recurs_1s' [3] (essai.c:56) called by:
[0](essai.c:134),
[3](essai.c:61),
'test' [4] (essai.c:121) called by:
[0](essai.c:139),
's1' [5] (essai.c:16) called by:
[3](essai.c:58),
'recurs' [6] (essai.c:48) called by:
[0](essai.c:130),
[6](essai.c:51),
'f3' [7] (essai.c:40) called by:
[0](essai.c:142),
'f1' [8] (essai.c:22) called by:
[7](essai.c:41),
'small' [9] (essai.c:113) called by:
[0](essai.c:145),
More details about how fnccheck library perform its job.
The fnccheck library uses a new feature of gcc V2.95.2
(or higher) called `-finstrument-functions'. This option
makes `gcc' to add to every functions a call at enter and at exit.
The fnccheck library gives these two functions called
void __cyg_profile_func_enter(void *this_fn, void *call_site)
and void __cyg_profile_func_exit(void *this_fn, void *call_site) .
The fnccheck library manages a stack of function calls and a
list of functions. For each of them, the number of calls, the
time spend in them (local and total) is computed. Moreover a list
of callers is associated to functions in order to manage call-graph.
At exit time, all these data are dumped into a file `./fnccheck.out'
in order to be used with fncdump program.
Note: the fnccheck library only works with symbol addresses
of functions, and not with names. Names are extracted from executable
by fncdump .
The profile data file `./fnccheck.out' format is:
HEADER
# of elements
f1_ptr #calls sec usec locsec locusec minsec minusec maxsec maxusec
f2_ptr #calls sec usec locsec locusec minsec minusec maxsec maxusec
...
# of realloc
stack_size func_table_size
nb_of_callers_f1 caller1_ptr rc1_ptr caller2_ptr rc2_ptr ...
nb_of_callers_f2 caller1_ptr rc1_ptr caller2_ptr rc2_ptr ...
...
time_mode
exec_name
HEADER : identifier (including file-format version) of stat-files
# el : number of functions
fi_ptr : symbol of the function
nb_calls : number of calls for this function
nb_sec : total time in the function
nb_usec : corresponding usec
_locsec : local time in the function
_locusec : corresponding usec
min_sec : minimum time spend in the function
_subusec : corresponding usec
min_sec : maximum time spend in the function
_subusec : corresponding usec
#realloc : number of 'realloc' performed during execution
stack_size: final stack size used during execution
func_tbl_s: number of function stored during execution
nbofcaller: number of callers for the corresp. function
calleri_pt: symbol of ith caller for the function
rci_ptr : symbol of caller of the function
time_mode : the FNCCHK_TIME type
exec_name : corresp. exec name (not implemented)
Here are the treatments performed by fnccheck library for
each enter/exit of functions:
At the first call:
Initializations of hash-tables, options, stack. Trapping
exits with atexit() and signal() functions.
When entering a function:
- *
-
check if the function is still registered, and register it if it is not
the case.
- *
-
add the call-site to callers list for this function
- *
-
store the function and its entering time in the stack
- *
-
compute the additional local time for the calling function. It is the
time between the last active time of this function minus the current time.
When exiting a function:
- *
-
compute the total execution time (difference between entering time and
current time)
- *
-
increment number of calls
- *
-
compute the additional local time for the calling function. It is the
time between the last active time of this function minus the current time.
- *
-
check if we are exiting
main() function, and is yes call the
dumping function.
When leaving the program:
Creates the profile data file and dumps data into it. Freed allocated
tables and remove treatments (because in some cases, in particular for
C++ programs, calls to exit treatments can occur AFTER exiting
from main() .
In order to allow debug during execution, you have to compile
the library fnccheck using target `dfnccheck'.
This allows you to use the shell variable `FNCCHK_DEBUG'.
Valid values are:
- 2
-
display collision in hash table;
- 4
-
details on `fnccheck' library functions;
- 8
-
details on `fnccheck' library treatments.
|