ScanDoc
- Introduction
- ScanDoc features
- Getting Scandoc
- Running ScanDoc
- Writing ScanDoc Comments
- Document Tags
- The Template File
- To-Do List
- Credits
Introduction
ScanDoc is a Perl script which scans C++ source code for specially-formatted
comments and produces attractive, organized, indexed documentation.
ScanDoc is designed to generate the highest-quality documentation with
as little effort as possible on the part of the programmer writing the
code to be documented. To this end, ScanDoc not only uses the documentation
supplied by the programmer, but supplements it by parsing the actual C++
data structure declarations.
Unlike other documentation scanners, Scandoc is themable, meaning
that the appearance of the output documentation can be controlled via a
"template" file. Here is an example of
a template which produces HTML output incorporating frames and indices
can be seen.
ScanDoc was written by Talin
and is Copyright ©1997-2000. Scandoc may be freely distributed under
the Artistic License (see COPYING)
ScanDoc features
- Portability -- because ScanDoc is written in Perl, it can be
executed on any platform that runs Perl.
- Ease of Use -- Once ScanDoc is set up it is very easy to use. There
are only a few command-line switches, all of which are optional.
- Convenient -- ScanDoc's comments are a superset of javadoc
and are very easy to write.
- Customizable -- ScanDoc uses a user-modifiable template file as
the source of all output text. You can give your documentation a unique
style without modifying the ScanDoc script itself. Scandoc has been
designed primarily to support HTML output, however templates can readily
be modified to support other output file types such as postscript, TeX,
.info, ASCII, etc.
- Comprehensive -- ScanDoc understands a wide range of C++ syntax,
including operator overloads, templates and template arguments, nested
classes, and friend functions.
- Fast -- a typical header file takes 1-2 seconds to process.
- Flexible -- functions can be grouped in any way you like. You decide
which functions go into which .HTML files.
Getting Scandoc
Scandoc is available via anonymous CVS:
cvs -d:pserver:anonymous@cvs.scandoc.sourceforge.net:/cvsroot/scandoc login
<(hit return when asked for password.>
cvs -z3 -d:pserver:anonymous@cvs.scandoc.sourceforge.net:/cvsroot/scandoc co scandoc
There is also a .tar.gz archive
available on the ScanDoc Home Page.
Runing Scandoc
Scandoc takes several command line switches (all of which are optional)
and a list of input source files (which can include wildcards). Here is
the command line syntax:
perl ScanDoc.pl -i document-template -p output-path -t tabsize -d sym=value input-files ...
The document-template argument specifies which file to use for the
template file. This template file is used to define the format of the output
text. You can edit this file to customize the "look" of your documentation.
The default is "template.html".
The output-path argument specifies the directory where
the resulting documentation should be written to. This should include a
directory seperator character ("\" for PC, ":" for Mac using MPW, and "/"
for unix) as the last character. The default is the current directory.
The tabsize argument specifies how many spaces tabs should
be expanded to. The default is 4.
The sym=value argument can be used to define a symbol. This symbol
will be defined within the scope of the expanded template, and can be used
as part of the output text.
It does not matter if there are whitespace characters between
the switches and their arguments.
Example:
perl ScanDoc.pl -itemplate.html -p./test/ -t4 *.h
Writing ScanDoc Comments
In order to use ScanDoc, you must embed special comments within your C++
source files. ScanDoc recognizes two forms of comments, those beginning
with "/**" and those beginning with "//*". The first
form are C-style multi-line comments. The second style are C++ single-line
comments. There must be a space after the "/**" and "//*"
tokens -- ScanDoc does not recognize comments of the form "/************"
and such. (However, if ScanDoc detects a row of asterisks, equals signs,
or dashes on the first or last line of a C-style comment, it will remove
them from the documentation. So you can say "/** ==============="
if you want a big, bold banner.)
Whenever ScanDoc sees a special comment, it knows that the next C++
declaration (class, function, or variable) should be documented. Any declaration
which is not preceded by a special comment will be omitted from the output
file. The purpose for this is to allow you to have private functions or
classes which are not present in the ScanDoc documentation. You can use
ordinary C and C++ style comments to document these declarations within
the source code, since ScanDoc ignores such comments.
The simplest way to use the special comments is to simply write
a description of the item within the comments:
/** Documentation for class Foo */
class Foo : private class Bar {
int Baz( void );
};
Document Tags
Document tags are special code which can be inserted within a ScanDoc comment.
They allow you to control many aspects of the generated documentation.
All document tags begin with an '@' character.
All tags must come at the beginning of a line of text (before any non-blank
characters). All of the text on that line is considered part of the tag.
Many of the tags described are persistent, in the sense that
they remain in effect until the next tag of the same type. A persistent
tag affects all documented items which come after it, in other words it's
effect can last for longer than a single documentation comment block.
Some of the tags described are continuable, which means that
they can be continued on the next line. There is no need to repeat the
tag. Continuable tags last until the end of the comment block, or until
they are overridden by another tag.
Text which occurs before the first tag is considered "description" text,
in other words it's the actual explanation of the declaration which follows
the comment. This is also true of any lines of text which do not begin
with a tag, and are not a continuation of a previous continuable tag. A
blank line within the description text is converted into an HTML paragraph
tag. Note that you are allowed to insert HTML tags into the description
text, however if you are outputting the text to a format other than HTML,
the output template file may or may not be smart enough to translate the
tag into an appropriate textual entity.
Example description:
/** This function adds two vectors together.
@param inVector1 The first vector
@param inVector2 This is the second vector
@return The vector sum of the two vectors
@see #VectorSubtract
@keywords Vector addition subtraction math
*/
const Vector &VectorAdd( const Vector &inVector1, const Vector &inVector2 );
The @package tag
Often times C++ functions and classes fall into natural groups. In Java,
these are called "packages", however in C++ there is no such construct,
so the @package tag can be used to group C++ declarations into a package.
ScanDoc writes the output documentation for each package as a seperate
.HTML file containing all of the documentation for that package named "packagename.HTML".
(Of course, that's only the default behavior. You can change it by editing
the document template file, as well as changing the names of the two index
files that ScanDoc creates.)
You can have classes from several different header files be grouped
into the same package, or you can have several classes from a single source
file go into several different packages. There is no one-to-one mapping.
The syntax for @package is:
@package package-name
The @package tag is persistent -- it remains in effect until the end of
the file, unless superceded by another @package tag. If no package tag
is specified, the default package named "General" is used.
The @author tag
The @author is used to specify the author of the subsequent declarations.
The entire line after @author is taken as the name of the author(s). Like
the @package tag, it is persistent -- it remains in effect until the end
of the file, unless superceded by another @author tag.
The @version tag
The @version is used to specify the version of the subsequent declarations.
The entire line after @version is taken as the version string. Like the
@package tag, it is persistent -- it remains in effect until the end of
the file, unless superceded by another @version tag.
The @keyword tag
The @keyword tag is primarily designed for use with HTML search engines.
The rest of the line after the @keyword tag is taken as a series of search
keywords. The keywords will not actually appear in the documentation, instead
they are placed into an HTML comment just before the documentation appears.
The @see tag
Each documentation entry can have a "See also" section, which is a list
of hypertext references to other relevant documentation. Each occurance
of an @see tag defines a hyperlink. (You can also embed normal HTML hyperlinks
within the class description and other places).
Immediately following the @see tag is the name of the class, class member,
or global being referenced. A single name with no special seperator characters
is taken to be the name of a class. The package name can be specified by
giving the name of the package, followed by '#', followed by the name of
the class or function. If the package name is omitted, it assumes that
the item being referred to is in the current package. Class members can
be referenced by appending '::' and the member name to the class name.
A '::' with no class name indicates a global function or member. Member
functions and global functions should not have argument lists or parenthesis
(Currently there is no way to indicate which one of a set of overloaded
functions is being referred to)
You can also include hyperlinks to other documentation that was not
created by ScanDoc by using a normal HTML hyperlink, which ScanDoc will
insert verbatim into the output file.
Here are all of the supported forms:
@see classname
@see package#classname
@see classname::member
@see package#classname::member
@see ::function
@see package#::function
@see <a href="ref">Description</a>
The @param tag
The param tag is used for documenting function parameters. Following the
@param keyword is the name of the parameter, and then the description.
The @param tag is continuable, which means you can continue the description
on the next line. Blank lines within the parameter descriptsion are converted
into HTML paragraph tags.
Example:
@param inRect The input rectangle to process. This will be scaled and copied to outRect.
@param outRect Where to place the scaled rectangle
The @return tag
The @return tag is used to describe the function return. If the function
returns nothing, this tag can be omitted. The tag is continuable, which
means you can continue the description on the next line without repeating
the @return tag.
The @exception tag
The @exception tag is used to document any exceptions that may be thrown
by this function. Note that unlike Java, it is difficult in C++ to determine
what exceptions might potentially be thrown by subroutines of the functions
being documented, so it is questionable whether programmers will be able
to easily maintain a list of every exception that could be thrown from
the function. Ultimately, the decision of how to handle this will depend
on local coding standards and practices.
The format of this section is exactly the same as @param.
The @heading tag
The @heading tag is used to insert a heading into your description text.
The remainder of the line after @heading is taken as the heading text (the
@heading tag is not continuable). When processed using the example template
file, ScanDoc converts the heading text to a level two heading.
The @deffunc tag
Occasionally, a function will have a syntax so strange that ScanDoc cannot
parse it. This is mainly due to the fact that ScanDoc does not have a complete
C++ parser within it. Also, ScanDoc ignores all preprocessor directives,
so it is difficult to add documentation entries for C macro functions.
The @@deffunc tag overcomes this limitation by allowing the programmer
to manually insert a "fake" function declaration.
The @deffunc tag is short for "define function". The effect of this
tag is exactly the same as if ScanDoc had actually parsed the function
declaration. Note that this means that the @deffunc must be the last tag
in the comment block, since any text or tags which come after it will be
applied to the next declaration.
The format of @deffunc is:
@deffunc short-name declaration
The "short-name" is the version that will appear in the index, i.e. just
a name with no argument list or return type. The declaration part is optional,
and should be the complete prototype.
Here is an example (Note that unlike normal declarations, you can use
HTML formatting within the actual prototype declaration):
/** Assert that a condition is TRUE, or print a message and exit.
@param expression The condition to test.
@deffunc ASSERT ASSERT( <expression< );
*/
#define ASSERT( expr ) _assert( expr, __FILE__, __LINE__ )
The @defvar tag
The @defvar tag is exactly like @deffunc except it defines a variable instead
of a function.
The @caution, @warning, @tip and @bug tags
Each of these tags inserts a small icon into the text at the point where
the tag occurs. For example, in the supplied example template, the "caution"
tag inserts a paragraph break followed by a triangular yellow "caution"
sign. These icons can be used to highlight a particular aspect of the text.
It should be noted that none of these three tags is in fact recognized
by ScanDoc itself -- the substitution of icons for tags is done in the
template file, and as such the template file creator is free to define
any new tags that they wish.
The @todo tag
The @todo tag records the name of the current source file and the text
of the tag (which is continuable) into a special "todo" table. This table
is then written out as a seperate file, allowing a conveniently summarized
"To-Do" list for the project. Note that a special comment with only an
"@todo" entry in it and no other description text or tags will be included
in the generated To-Do list, but not in any other generated documentation;
The reason for this is that you might not want to document everything that
also has a To-Do entry associated with it, so the scanner does not consider
a documentation entry valid unless it has at least some descriptive
text or tags.
The Template File
The template file tells ScanDoc how to format the output files. There is
virtually no knowledge of HTML within ScanDoc itself, all of the rest is
supplied by the template file. (Actually, there are some functions that
make generating HTML easier, but templates don't have to use them.)
ScanDoc comes with an example template file called "template.html".
If all you want to do is change the name of the project or insert your
company logo, you need read no further; Simply edit the "template.html"
file and insert your project name or logo in the appropriate fields at
the top of the file.
If you want to do more detailed customization, however, you'll need
to understand how a template file is actually intepreted, which requires
some understanding of ScanDoc's overall order of operations. You'll also
need a basic understanding of the Perl language.
When ScanDoc scans source files, it builds a large data structure which
stores all of the packages, classes, member functions, parameters, documentation
and other entities that it finds. These are stored using nested Perl hash
tables.
After parsing is complete, the template file is parsed and executed.
The template file consists mostly of output text, with occasional parameter
substitutions and embedded program code. ScanDoc translates the template
file into a long string which is a Perl program, and then executes that
string. So, you can embed Perl code directly into the template, allowing
you to open output files, iterate using "for" loops, create comma-seperated
lists, etc. This embedded code has access to all of the data structures
built during the parsing phase.
Any text which is not embedded code will be written directly to the
current output file. This text can have parameter substitutions in it.
There are two primary types of substitions. The first type is the normal
Perl interpolation sequence, i.e. $variable. This means that the
value of the variable will be inserted into the output text at that point.
The other type of substitution is the sequence $(object.fieldname). This
retrieves the named field from the given object, and inserts the value
of that field into the output text at that point. (Note: The way this is
implemented is that ScanDoc translates the $(object.fieldname) pattern
into the sequence: "print $object->fieldname()").
Embedded code is indicated by using double angle brackets, i.e. <<code>>.
Any code which is within the angle brackets will be executed at that point.
For loops can be written as seperate pieces, i.e.
<<foreach $a (@list) {>>
<h2>$(a.name)</h2>
$(a.description)<p>
<<}>>
Access to the parse tables and other data is done through global functions,
which are as follows:
ScanDoc Global Functions
Function |
Meaning |
file( "filename.txt" ) |
Open a new output file and make it the current output file |
packages |
Return a list of references to all packages, in order by name. |
todolist_files() |
Returns a list of all source files which had "to-do" entries. |
todolist_entries( file ) |
Returns a list of all "to-do" entries for a given file. |
Each package is a reference to a Perl object of type "PackageRecord".
Access to the classes and globals within the package is done via the member
functions of the package.
"PackageRecord" Member functions
Function |
Meaning |
classes |
Returns a list of references to all classes in the package, in order
by name. |
globals |
Returns a list of all global functions and variables in the package,
in order by name. |
globalvars |
Returns a list of references for all global variables in the package. |
globalfuncs |
Returns a list of references for all global functions in the package. |
name |
Returns a string containg the name of the package |
url
|
Returns the suggested HTML url of the package documentation.
|
anchor
|
Returns the suggested HTML anchor of the package documentation.
|
Each class returned by the classes() member function is a reference
to a Perl object of type ClassRecord. This class has the following member
functions.
"ClassRecord" Member functions
Function |
Meaning |
keywords |
Returns a string the list of keywords associated with this class. |
author |
Returns the name of the author of the class. |
version |
Returns a string containing the version information for the class. |
name |
Returns a string containing the "short" name of the class, i.e. without
"class" or "struct", and without any template params or scoping information. |
longname |
Similar to "name" but includes the "class" or "struct" tag. |
fullname
|
Includes the "class" or "struct" tag and the template arguments.
|
scopename
|
The complete class name including scoping information for embedded
classes.
|
source file
|
The name of the source file where the class was defined.
|
description
|
The description text of the class documentation.
|
seealso
|
The list of "see also" tags. This is a list of references to "DocReference"
objects.
|
url
|
Returns the suggested HTML URL of the class documentation.
|
anchor
|
Returns the suggested HTML anchor of the class documentation.
|
members
|
Returns a list of references to all class members.
|
membervars
|
Returns a list of references to all class member variables.
|
memberfuncs
|
Returns a list of references to all class member functrions.
|
baseclasses
|
Returns a list of references to all base classes.
|
subclasses
|
Returns a list of references to all subclasses.
|
Each member function record returned by the members() function (as well
as the membervars() and memberfuncs()) function is a reference to a Perl
object of type MemberRecord. MemberRecord is also used for the references
returned by globals(), globalvars(), and globalfuncs() which are returned
at the package level.
"MemberRecord" Member functions
Function |
Meaning |
keywords |
Returns a string the list of keywords associated with this member. |
author |
Returns the name of the author of the member. |
version |
Returns a string containing the version information for the member. |
name |
Returns a string containing the "short" name of the class, i.e. without
the type or argument list. |
longname |
Similar to "name" but includes '()' at the end if it's a function. |
fullname
|
Includes the type of the variable and the arguments if any.
|
scopename
|
The complete member name including scoping information.
|
source file
|
The name of the source file where the member was defined.
|
description
|
The description text of the member documentation.
|
seealso
|
The list of "see also" tags. This is a list of references to "DocReference"
objects.
|
url
|
Returns the suggested HTML URL of the member documentation.
|
anchor
|
Returns the suggested HTML anchor of the member documentation.
|
type
|
'func' if it's a function, else 'var' if it's a variable.
|
params
|
Returns a list of parameters (as defined by the @param tags) for
this item.
|
exceptions
|
Returns a list of exceptions (as defined by the @exception tags)
for this item.
|
returnval
|
Returns the text of the @return tag.
|
Parameters and exceptions are Perl objects of type "ArgRecord", which
has the following members:
"ArgRecord" Member functions
Function |
Meaning |
name |
Returns a string containg the name of the argument |
description
|
Returns the description text for the argument.
|
Finally, the list of references returned by the "seealso" function are
references to Perl objects of type "DocReference":
"DocReference" Member functions
Function |
Meaning |
name |
Returns a string containg the name of the reference |
url
|
If ScanDoc knows about this reference, it will return the URL string
that it suggested; If the item is not recognized, it will return 0.
|
Base classes: In some cases, a base class mentioned in another
class's "baseclasses" list will be a class that ScanDoc does not know about.
Because ScanDoc does not parse #include directives, it's possible for a
class to inherit from a base class that is defined outside the set of files
being parsed by ScanDoc. In this case, ScanDoc will create a "partial"
class record, consisting of only the name, longname, fullname, and scopename
fields. In particular, the "url()" member function will return 0, since
ScanDoc does not know from where this class originates. In such a case,
the output template should detect that there is no URL and not attempt
to create a hyperlink for the class reference.
Description Filtering: The description text returned by the "description()"
function returns the bare text as found within the source code. The only
filtering that ScanDoc does on this text is to expand all tabs to spaces.
The template code is responsible for any other filtering, such as converting
blank lines to paragraphs, converting @heading tags to the appropriate
style, and inserting the caution, warning and bug icons. Note that the
template is free to define new icons or tags which can be filtered at this
time.
To Do list
This is a list of enhancements that are needed for ScanDoc.
Templates for formats other than HTML: Currently, HTML is the
only output format supported because it's the only one that I am familiar
with. However, it seems that a lot of documentation these days is in TeX
format, which is then used to generate .info, .dvi. etc. It would be nice
to have template files for these formats. Note also that there is no reason
why ScanDoc could not be modified to support multiple templates in a single
execution, which would be relatively fast since parsing the input classes
is what takes 95% of the execution time.
Other HTML templates: It would be nice to have a selection of
HTML templates for different styles. For example, the current template
file generates documentation which takes advantage of the "frames" feature
which is not supported by all browsers, although the documentation can
still be viewed with a browser that does not support frames. However, it
might be nice to provide templates which don't generate any frames information.
Similarly, there is much that can be done in terms of improving the overall
attractiveness and organization of the documents, especially by taking
better advantage of tables.
Include files: Currently, ScanDoc does not attempt to parse "#include"
statements. (As much as I like Perl, it's not a great language for writing
recursive descent parsers in my opinion.) Unfortunately, this means that
ScanDoc has to "guess" which identifiers represent types as opposed to
function and variable names. The current heuristic handles all of the cases
I've found so far, but it would be nice to be able to know for sure. Of
course, doing a complete job would also require that we recognize and expand
C macros. Having a complete set of type information and better parsing
would also allow individual function arguments and return values to be
hyperlinks.
Hyperlinks on arguments and return values: Even without a more
complete parser, it would be possible to modify the current HTML template
file to break up argument lists into a sequence of bare words and see if
any of those bare words match up with any of the current classes that ScanDoc
knows about. Hyperlinks could then be created to those classes. This wouldn't
work every time (for example, implicit references to classes defined within
the current class's scope would have problems), but it would cover most
of the common cases.
Improved description filtering: Currently, the "processDescription"
function in the HTML template does not handle the case of '<' and '>'
symbols embedded in the description text. This means that attempts to mention
template arguments within documentation generally don't look right. Greater-than
and Less-than signs should usually be transformed into < and >
sequences, unless they are part of a valid HTML tag that has been deliberately
inserted into the documentation text. I would need to come up with a look-ahead
regex that would match all of the HTML tags that might reasonably occur
inside a documentation entry.
Persistent Scan Info: Several people have suggested that they
would like to keep their documentation files always up to date, in other
words running ScanDoc whenever they do a make. Unfortunately, because ScanDoc
scans every file in the project, this is far too slow for realistically-sized
projects. One idea would be to create a persistent database of documentation,
which could be updated incrementally, allowing ScanDoc to only scan those
files that have actually changed. This would also allow the code analyzer
to be written in a different language than the templates. For example,
we could use a real parser generator, and make a C analyser that would
do a much more complete job of parsing, as well as being much faster. This
would also have the benefit of making the parser somewhat readable, which
it certainly isn't now.
Because ScanDoc can pretty much ignore anything that's inside a code
block, a C parser could potentially parse much faster than an actual compiler.
This means that scanning just a few files and updating a database would
add an unnoticeable amount of delay to the build process. The only question
is how to maintain the database in a way that's portable.
This idea of a persistent database of documentation could be taken even
further. For example, rather than generating static pages, the documentation
pages could be served up directly from the database, using something like
PHP to create HTML pages of documentation on the fly as needed. This would
also allow more intelligent queries, for example "give me documentation
for all classes that call member function 'foo' in class 'bar'." Of course,
we would have to parse things a lot more deeply than we do now for this
to work. And one problem with dynamic pages is that they are hard to distribute
in an archive.
History and credits
The current version of ScanDoc is actually the sixth generation. The first
one was written in C, sometime in the early 1993-1994 range, and was inspired
by (and functionally similarly to) the "autodoc" utility on the Commodore
Amiga. Later versions were inspired by Sun's JavaDoc utility. I've also
read about Don Knuth's "Literate Programming" efforts, but I wanted something
that was much lighter weight and easier to integrate into existing environments.
Robert McNally of Dangerous Games came up suggested to me the idea of
having embedded icons in the documentation to signify important paragraphs.
© 1997-2000 Talin.
Last Updated: 26 June 1998
|