This manual describes the profiler called fncdump
, and how you
can use it to determine which parts of a program are taking most of the
execution time. We assume that you know how to write, compile, and
execute programs. fncdump
was written by Yannick Perret.
Copyright (C) 2001 Yannick Perret.
Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies.
Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one.
Permission is granted to copy and distribute translations of this manual into another language, under the same conditions as for modified versions.
Profiling allows you to learn where your program spent its time and which functions called which other functions while it was executing. This information can show you which pieces of your program are slower than you expected, and might be candidates for rewriting to make your program execute faster. It can also tell you which functions are being called more or less often than you expected. This may help you spot bugs that had otherwise been unnoticed.
Since the profiler uses information collected during the actual execution of your program, it can be used on programs that are too large or too complex to analyze by reading the source. However, how your program is run will affect the information that shows up in the profile data. If you don't use some feature of your program while it is being profiled, no profile information will be generated for that feature.
fncdump
profiler is divided into two part. The first one is
the fnccheck
library, which needs to be linked with the program
to be profiled. The second part is the fncdump
program itself
which extract and display profile information for your program.
Profiling has several steps:
fncdump
to analyze the profile data.
See section fncdump
Command Summary.
The next three chapters explain these steps in greater detail.
Several forms of output are available from the analysis.
The flat profile shows how much time your program spent in each function, and how many times that function was called. If you simply want to know which functions burn most of the cycles, it is stated concisely here. See section The Flat Profile.
The call graph shows, for each function, which functions called it or which other functions it called. See section The Call Graph.
To better understand how profiling works, you may wish to read a description of its implementation. See section Implementation of Profiling.
The first step in generating profile information for your program is
to compile and link it with profiling enabled. Remember that fncdump
can only be used with gcc
V2.95.2 or higher.
To compile a source file for profiling, specify the `-finstrument-functions -g' options when you run the compiler (This is in addition to the options you normally use).
To link the program for profiling, add the `fnccheck' library to your link command (`-lfnccheck'). Here are examples:
gcc -finstrument-functions -g -c myprog.c utils.c gcc -o myprog myprog.o utils.o -lfnccheck
The `-finstrument-function -g' options also works with a command that both compiles and links:
gcc -o myprog myprog.c utils.c -finstrument-functions -g -lfnccheck
If you compile only some of the modules of the program with `-finstrument-function -g', you can still profile the program, but you won't get complete information about the modules that were compiled without `-finstrument-function -g'. The only information you get is the time spend in them, seen as local time for the calling functions.
Once the program is compiled for profiling, you must run it in order to
generate the information that fncdump
needs. Simply run the program
as usual, using the normal arguments, file names, etc. The program should
run normally, producing the same output as usual. It will, however, run
somewhat slower than normal because of the time spent collecting and the
writing the profile data.
The way you run the program--the arguments and input that you give it--may have a dramatic effect on what the profile information shows. The profile data will describe the parts of the program that were activated for the particular input you use. For example, if the first command you give to your program is to quit, the profile data will show the time used in initialization and in cleanup, but not much else.
Your program will write the profile data into a file called `fnccheck.out' just before exiting. If there is already a file called `fnccheck.out', its contents are overwritten. You can specify an other file name by setting the shell variable `FNCCHK_OUTFILE' with the needed file name.
If your program uses fork()
function, two processes will have to
dump their data. In this case, the main process will dump into
`fnccheck.out' or whatever name you set, and the child(s) process(es)
will dump into `fnccheck.out_PID_OF_CHILD'. You can prevent childs
to dump by using the shell variable `FNCCHK_NOCHILD'.
The `fnccheck.out' is written when returning from main
or by
calling exit
. Some signals are catched to perform file writing
on kills, at the condition that the program don't redirect them.
The `fnccheck.out' file is written in the program's current working
directory at the time it exits. This means that if your program calls
chdir
, the `fnccheck.out' file will be left in the last directory
your program chdir
'd to. If you don't have permission to write in
this directory, the file is not written, and you will get an error message.
Several shell variables can be used to change profile behavior:
fncdump
Command Summary
After you have a profile data file `fnccheck.out', you can run
fncdump
to interpret the information in it. The fncdump
program prints a
flat profile and a call graph on standard output. Typically you would
redirect the output of fncdump
into a file with `>'.
You run fncdump
like this:
fncdump [options] executable-file [> outfile]
Here square-brackets indicate optional arguments.
These options specify which of several output formats
fncdump
should produce.
-avg
-average
fncdump
to display an estimated
over-cost time of fnccheck
library, expressed in time per function.
-sfile <file>
fncdump
uses `<file>'
as profile data file instead of default `./fnccheck.out'.
-sort <n> / +sort <n>
-calls / +calls
fncdump
to display calls list of each
function, instead of displaying list of callers of each functions.
The `+calls' option display both the two lists.
-no-minmax
fncdump
not to display MIN and MAX time
list. This list shows for each function the MIN and MAX total time
spend in it.
-func-details
fncdump
to print
a list of each function, with its corresponding file and line number.
This option can't be used with `-nm' option.
-call-details / -calls-details
fncdump
to
display file and line number of callers site in calls list.
This option can't be used with `-nm' option.
-fullname
fncdump
to print full pathnames for files instead of
basenames.
-call-graph
fncdump
to output the call-graph of
the profiled program, using VCG
format (a GPL call-graph
displayer). No other information is then displayed.
-cycles
fncdump
to add the detected cycles
during execution. Two kind of cycles can be displayed. Simple
recursions (a function calls itself), and cycles, in which case
the begin and the end of the cycle are displayed.
-sum-profile 'f1.out' 'f2.out' 'fr.out'
fncdump
to add the profile data files
`f1.out' and `f2.out' contents, and to generates a
new profile data file called `fr.out'. If `fr.out' is
`-', `stdout' is used. If you sum files from different
programs or different compilations, the new profile data file
will have very few sence. Note: you can use the same name for
`fr.out' as for `f1.out' or `f2.out' (overwrite
is possible).
These options control various aspects of the `fncdump' behavior.
-nm
nm
program
to extract function names from symbol addresses, instead of
using library bfd
from binutils. In this case, file and
line number of functions are not available.
-addr2line
addr2line
program
to extract function names from symbol addresses, instead of
using library bfd
from binutils.
Note: if library bfd
is not available on your system,
`fncdump' can be compiled without using it. In this case,
default method is addr2line
.
--help / -help / --h / -h
--version /-version / --v / -v
--misc
--details
These options specify how to set or remove functions from
fncdump
display.
-only <flist>
fncdump
functions list. `<flist>' is a comma
separated list of function names. Only these functions will be
used to generate profile, call-graph...
-not <flist>
fncdump
function list. `<flist>' is a comma
separated list of function names. All these functions will be
removed from functions list to generate profile, call-graph...
-propagate
f()
calls a function g()
,
using options `-only f -propagate' will make both `f()'
and `g()' to be used in functions list.
In the same way, using options `-not f -propagate' will
make both `f()' and `g()' to be removed from functions list.
-rpropagate
g()
is called by a function f()
,
using options `-only g -rpropagate' will make both `f()'
and `g()' to be used in functions list.
In the same way, using options `-not g -rpropagate' will
make both `f()' and `g()' to be removed from functions list.
-no-spontaneous
fncdump
to not display
functions that seem to be called spontaneously. This can happen if you
profile functions that are called by non-profiled functions.
Note: this option not apply to main()
function.
-no-unresolved
fncdump
to not display
functions for which names are not find in the executable. This can
happen if you give an invalid executable name to `fncdump', or
if you profile functions which are loaded at execution time (using
dlopen(...)
from dlfcn.h
. In this case the functions
addresses are not available in executable. The default behavior of
`fncdump' is to replace unresolved function names by their
symbol address (i.e. `<0x8049178>').
-real-maxtime
fncdump
to compute
total execution time using final functions list (after `-not'
or `-only' options applied). The total execution time is used
to compute time percentages in flat profile display.
WARNING: this option is desactivated in version 1.2, due to changes
in memory representation of call-graph. It will be activated again
in future release.
fncdump
's Output
fncdump
first displays the cycles detected during execution,
if the option `-cycles' is activated.
fncdump
then displays various informations
about the profiled program.
the executable name
Total execution time
main()
function. It can be different if main()
is not in a profiled object, or if main()
has been removed
from functions list using `-not' or `-only' options.
In this case the execution time is followed by the message
`(not main() time)'.
The clock type used
number of realloc
Then the flat profile is displayed (see below).
After the flat profile, some other informations are displayed:
number of unresolved functions not shown
number of `-not/-only' functions not shown
final stack size
number of functions
Then the MIN/MAX time table is displayed. For each function, its max and min total execution time is displayed. This table is not displayed if option `-no-minmax' is used.
The final data displayed is the call-graph (see below).
The flat profile gives for each function various informations, such as local time spend in the function, total execution time, number of calls...
Here is a example of flat profile on a small program:
| local | total | | | | sec. | % | sec. | % | calls |t. sec/call| name |-----------|-----|-----------|-----|----------|-----------|-------- | 0.068976| 0.3| 23.331874|100.0| 1| 23.331874| main | 5.056025| 21.7| 10.078856| 43.2| 4| 2.519714| recurs_a | 5.022831| 21.5| 8.822895| 37.8| 4| 2.205724| recurs_b | 4.037846| 17.3| 8.077933| 34.6| 4| 2.019483| recurs_1s | 5.105893| 21.9| 5.105893| 21.9| 1| 5.105893| test | 4.040068| 17.3| 4.040068| 17.3| 4| 1.010017| s1 | 0.000097| 0.0| 0.000180| 0.0| 32| 0.000006| recurs | 0.000020| 0.0| 0.000030| 0.0| 1| 0.000030| f3 | 0.000010| 0.0| 0.000010| 0.0| 3| 0.000003| f1 | 0.000006| 0.0| 0.000006| 0.0| 1| 0.000006| small
The first two columns display the local time spend in each function. Local time is the total execution time of the function minus the time spend in (profiled) child functions. The second column is the percentage of this time regards to the total execution time.
The next two columns display the total time spend in each function. The second column is the percentage of this time regards to the total execution time.
The fifth column shows the number of calls during execution for each function.
The next column indicate the average time per call, computed as the total execution time divided by the number of calls.
The last column displays the corresponding function name.
Call-graph shows for each function the list of its callers, or its childs if using option `-calls'.
Here is a example of call-graph for a small program:
'main' [0] spontaneously called. 'recurs_a' [1] called by: [0], [2], 'recurs_b' [2] called by: [1], 'recurs_1s' [3] called by: [0], [3], 'test' [4] called by: [0], 's1' [5] called by: [3], 'recurs' [6] called by: [0], [6], 'f3' [7] called by: [0], 'f1' [8] called by: [7], 'small' [9] called by: [0],
If option `-calls' is used, callers are replaced by list of child functions. For the same example it becomes:
'main' [0] calls: [1], [3], [4], [6], [7], [9], 'recurs_a' [1] calls: [2], 'recurs_b' [2] calls: [1], 'recurs_1s' [3] calls: [3], [5], 'test' [4] calls: nobody. 's1' [5] calls: nobody. 'recurs' [6] calls: [6], 'f3' [7] calls: [8], 'f1' [8] calls: nobody. 'small' [9] calls: nobody.
If option `-call-details' is used, file and line number of each function and each caller is added to output. Here is an example for the same program:
'main' [0] (essai.c:128) spontaneously called. 'recurs_a' [1] (essai.c:68) called by: [0](essai.c:132), [2](essai.c:94), 'recurs_b' [2] (essai.c:82) called by: [1](essai.c:78), 'recurs_1s' [3] (essai.c:56) called by: [0](essai.c:134), [3](essai.c:61), 'test' [4] (essai.c:121) called by: [0](essai.c:139), 's1' [5] (essai.c:16) called by: [3](essai.c:58), 'recurs' [6] (essai.c:48) called by: [0](essai.c:130), [6](essai.c:51), 'f3' [7] (essai.c:40) called by: [0](essai.c:142), 'f1' [8] (essai.c:22) called by: [7](essai.c:41), 'small' [9] (essai.c:113) called by: [0](essai.c:145),
More details about how fnccheck
library perform its job.
The fnccheck
library uses a new feature of gcc
V2.95.2
(or higher) called `-finstrument-functions'. This option
makes `gcc' to add to every functions a call at enter and at exit.
The fnccheck
library gives these two functions called
void __cyg_profile_func_enter(void *this_fn, void *call_site)
and void __cyg_profile_func_exit(void *this_fn, void *call_site)
.
The fnccheck
library manages a stack of function calls and a
list of functions. For each of them, the number of calls, the
time spend in them (local and total) is computed. Moreover a list
of callers is associated to functions in order to manage call-graph.
At exit time, all these data are dumped into a file `./fnccheck.out'
in order to be used with fncdump
program.
Note: the fnccheck
library only works with symbol addresses
of functions, and not with names. Names are extracted from executable
by fncdump
.
The profile data file `./fnccheck.out' format is:
HEADER # of elements f1_ptr #calls sec usec locsec locusec minsec minusec maxsec maxusec f2_ptr #calls sec usec locsec locusec minsec minusec maxsec maxusec ... # of realloc stack_size func_table_size nb_of_callers_f1 caller1_ptr rc1_ptr caller2_ptr rc2_ptr ... nb_of_callers_f2 caller1_ptr rc1_ptr caller2_ptr rc2_ptr ... ... time_mode exec_name
HEADER : identifier (including file-format version) of stat-files # el : number of functions fi_ptr : symbol of the function nb_calls : number of calls for this function nb_sec : total time in the function
nb_usec : corresponding usec
_locsec : local time in the function
_locusec : corresponding usec
min_sec : minimum time spend in the function
_subusec : corresponding usec
min_sec : maximum time spend in the function
_subusec : corresponding usec
#realloc : number of 'realloc' performed during execution
stack_size: final stack size used during execution
func_tbl_s: number of function stored during execution
nbofcaller: number of callers for the corresp. function
calleri_pt: symbol of ith caller for the function
rci_ptr : symbol of caller of the function
time_mode : the FNCCHK_TIME type
exec_name : corresp. exec name (not implemented)
fnccheck
's Internal Operation
Here are the treatments performed by fnccheck
library for
each enter/exit of functions:
At the first call:
Initializations of hash-tables, options, stack. Trapping
exits with atexit()
and signal()
functions.
When entering a function:
When exiting a function:
main()
function, and is yes call the
dumping function.
When leaving the program:
Creates the profile data file and dumps data into it. Freed allocated
tables and remove treatments (because in some cases, in particular for
C++
programs, calls to exit treatments can occur AFTER exiting
from main()
.
In order to allow debug during execution, you have to compile
the library fnccheck
using target `dfnccheck'.
This allows you to use the shell variable `FNCCHK_DEBUG'.
Valid values are: