
		graph-includes toolkit
		======================

IN SHORT
--------

Graph-includes creates a graph of dependencies between source-files
and/or groups of source-files, with an emphasis on getting readable
and usable graphs even for large projects.

Usability of the dependency graphs are currently improved by:
- customizable grouping of several source files into a single node
- transitive reduction of the graph

It currently supports graphing the C/C++ #include relationship, using
graphviz.


IMPORTANT NOTICE
----------------

This tool has evolved from a 50-line script written for a particular
project (Battle for Wesnoth).  Although it has been generalized much,
there are still somewhat ad-hoc heuristics harcoded here and there,
especially in the default project class (see class descriptions below).

Although work is under way to make this tool as generic as possible,
work still has to be done at all levels.  It is still under
development, and may not suit your needs (at least, not yet).


INSTALLATION INSTRUCTIONS
-------------------------

Like standard perl packages.  Eg:

$ perl Makefile.PL prefix=/usr/local
$ make
$ su
# make install


Be sure that the directory in which the library modules got installed
is in your perl library path.  Eg, if "graph-includes --version" does
not give the expected result, try setting the PERL5LIB environment
variable to (in the above example) /usr/local/share/perl/5.8.4/.

New versions can be found at http://ydirson.free.fr/soft/graph-includes/.

A darcs repository is available at
http://ydirson.free.fr/soft/graph-includes/darcs/.  Note however, that
I "push" to it using plain FTP mirroring, which is not an official way
of pushing a darcs repository, so only time will tell whether that
completely works, but it seems to be at least somewhat functional.


HOW TO TAKE ADVANTAGE OF THIS TOOL TO IMPROVE YOUR CODE
-------------------------------------------------------

Graph-includes is only a supporting tool for a refactoring effort.  It
can be useful in helping a developper to see where he should put its
efforts in order to get cleaner and saner dependencies in a project.

In this respect, it is quite similar to a microscope: if you don't
look at the right place, you won't see anything interesting.  But if
you start with a small magnifying factor, you can locate regions of
interest, and then zoom on those to get to the interesting stuff.


1. on the spirit of dependency cleanup

1.1. first look at a dependency graph

When developping a project of medium size (we'll talk mostly C/C++
here, but that will apply to most languages), expecially with many
people writing code, it is quite easy to get to a point where each
file (out of several tens of hundreds of files) depends on too many
other files.

The most obvious relation is the #include one.  The more #includes a
file has, the more time it takes to build - especially when those
included files #include themselves a bunch of other files.  For a
project of about 100 files, just producing a graph of all those files,
with arrows representing the #include dependencies, will usually give
an unreadable graph, and will show very little about possible
improvements.  This is why this tool has been written: to make it
possible to get to the useful information hidden in this unusable
dependency graph.


1.2. looking further

A less obvious relation appears more clearly when you consider not
files by themselves, but the set of files made of an interface and the
matching implementation.  Let's consider two such sets, made of the
files a.h, a.c, b.h, b.c.  a.c includes b.h, and b.c includes a.h, and
each implementation, following good practice, includes its own
interface.  A simple dependency graph as described above would show
such a graph:

        a.c -> b.h              
           \  /|
            \/
            /\
           /  \|
        b.c -> a.h

If OTOH we represent those sets of files instead of the files
themselves, we now have something like:

	a <--> b

This shows much more clearly that those two modules are intrinsicately
related.  In many cases, this will express that whenever you use the
a.o file resulting from the build of a.c, you'll need to link b.o as
well, and vice versa.  This will be the case when each file uses the
headers to get function prototypes.  Then hunting for abusive
dependencies will allow, for example, to select with finer grain which
of those modules of code will need to go into which executable, thus
producing lighter executables.

Note that such a reciprocal dependency may not be pathological.  Many
projects tend to split a large module into several files for clarity,
even when those files are inter-dependant.  It is much often in cycles
of unidirectional dependencies that we find dependencies that should
not be there.

In other cases, headers would just have been used to access a type
definition from b.h, and the associated b.o would not be needed.  In
such cases, you may want to consider splitting such "low-level"
declarations into their own headers.  Not only this would simplify the
graph, allowing you to get a better grasp on your source code, but it
can also lead to faster compilations, since each file will be able
include less unrelated definitions.


2. possible strategies to help locating abusive dependencies

More to be written.



COMMAND-LINE USAGE
------------------

See "graph-includes --help".

1. output type

The default output is a .dot file on standard output, suitable for
formatting by dot (from the graphviz toolkit), or interactive editing
by dotty (also from graphviz).

You can ask graph-includes to do the formatting for you, eg. using
"--output=<file>.<suffix>".  It will run "dot -T<suffix>", so that
"--output=mydeps.ps" or "--output=mydeps.jpg" will have the expected
behaviour.  If your suffix is not known to dot, it will complain
itself, so asking for --output=foo.bar will cause a message like:

Warning: language bar not recognized, use one of: canon cmap cmapx dia dot fig gd gd2 gif hpgl imap ismap jpeg jpg mif mp pcl pic plain plain-ext png ps ps2 svg svgz vrml vtx wbmp xdot

If you intend to print the result on paper, the default layout will
likely be too large.  You can use --paper=a4 to select parameters that
will produce a smaller graph and spilt it into pages.  This flag also
changes the default output format to postscript.  Be warned that dot
may not honor the page-splitting parameter for all output formats.

Since the transitive reduction can take time, you may like the
--verbose switch, which will show a progress bar.


2. what to draw

The files to be analyzed are given as non-option arguments, and can be
explicitely specified, or found by recursing in directories.  Eg, to
analyse foo.c in current directory, as well as all C/C++ files in the
src/ directory, use:

	$ graph-includes foo.c src/

When an directory argument is specified, it is searched for files
whose name matches a specific regexp pattern, whose default value
depends on the specified language (see --language below).  This
pattern can be overriden using the --fileregexp option.  Eg, to match
in addition to .c and .h files, those with an additional .tmpl suffix,
you could write:

	$ graph-includes -I src -fileregexp '\.[ch](\.tmpl)?$' src/

How dependencies get extracted from the source files depend on the
language used in those files.  You can specify it with the --language
flag.  Default value is C (which should also be used for other
languages based on the C preprocessor, like C++).  There is also some
partial support for perl - see comments in
lib/graphincludes/extractor/perl.pm for more details.

In order to tell the #include resolver where to look for included
files, you can use the cpp-like -I (aka. --Include) flag.  Eg:

	$ graph-includes -I src src/

Dependencies not found in the project (ie. files appearing in #include
but not given on command-line) are listed as "not found" in the
graph-includes.report file for diagnostics purposes, unless they are
found in a system directory.  System directories are declared in a
similar fashion, with the --sysInclude option.  Eg:

	$ graph-includes -I src -sysI /usr/include src/

To avoid having useless information on the graph,
--prefixstrip=<prefix> can be used to avoid repeating a given prefix
in all node labels.  Typically:

	$ graph-includes --prefixstrip=src/ src/


3. how to draw

Files will be grouped in a hierarchy of groups, level 0 groups
typically containing just one file.  Groups are defined by the
selected project class, selected by the --class=<class> option.  See
below for descriptions of the project classes available by default,
and for instructions to write customized project classes.

The range of group levels to be drawn is selected with
--group=<min>-<max>, which defaults to 1-1.  Eg, for class "default",
whose group levels are defined as:

0: one file per group
1: what/ever.* go into a "what/ever" group (usually interface + implementation)
2: what/* go into a "what" group, supposing directories denote modules of some sort

Group levels below "min" or above "max" are not displayed as nodes.
Groups of level "min" are drawn as nodes of the graph.  If "max" is
strictly greater than "min", then groups of levels "min+1" through
"max" are drawn as box clusters containing lower-level groups.

Since such a way of grouping nodes will not improve the readability in
projects where the inter-groups dependencies have not been cleaned up
yet, higher-level groups can instead be colored, using a class-defined
color scheme, possibly modified by "--color <n>:<label>=<color>[,<label>=<color>...]"
options, where <n> is the group level in which the group name <label> will
receive a background of the specified color, which can be defined
either by a named X11 color (like "blue" or "palegreen"), or by a RGB
color using the standard X11 "#RRGGBB" syntax.


For those wanting to see what edges the transitive reduction dropped,
the --showdropped will add them to the graph in a different color.  Be
prepared for your computer room to get a noticeable temperature
increase for anything else than a small set of files with only few
dependencies.

OTOH, --focus=<node-label> will do the same, but only for the
dependencies of a specified node.  That should prevent the nasty
effects described above, and will be useful for various purposes,
including debugging the transitive reducer.  The node-label refers to
a node in the lowest group-level drawn, ie. the "min" argument to
--group.

People still getting cold may also like to circumvent the
transitive-reduction engine completely, using --alldeps.  The author
assumes no responsibility for losses of mental health induced by
trying to make any serious use of the resulting graph.


EXISTING PROJECT CLASSES
------------------------

1. class "default"

As implied by its name, it is the one which will be used unless you
use the --class option.  Although it is the default one, it may still
be quite rough at the moment, still using some ad-hoc heuristics, and
will be improved in the near future.  Here are its main
characteristics:

 - looks at C-style #include lines
 - creates level-1 groups for all files sharing the same path and
   (disregarding the suffix) filename.  Eg, files "foo/bar.c" and
   "foo/bar.h" would be grouped in a "foo/bar" level-1 group.
   In clear, it won't connect include files if they are all located
   in an include/ directory.
 - creates by-directory level-2 groups.  Eg. in the above example, a
   group "foo" would exist at level-2.


2. class "uniqueincludes"

Built on top of the default class, it is meant for projects where file
names are kept unique across all directories.  If the ad-hoc #include
processing of the default class does not suit your project, it is the
only out-of-the-box alternative available today.  Here are its main
characteristics:

 - provides a single grouping level based on filenames, disregarding
   all the directory hierarchy.

Note that it is not meant for general use, as:

 - it will group any files with the same name in the same level-0
   group, possibly causing confusion.
 - it does not make any directory name appear in the node names


DEFINING YOUR OWN PROJECT CLASS
-------------------------------

See graphincludes::project::wesnoth in the examples/ dir as an example.

Keep in mind that the API is not frozen yet, and will probably be
overhauled more than once before an official API gets blessed.


EXAMPLES
--------

Rather clean ones:

Maelstrom-3.0.6$ graph-includes -v -sysI /usr/include -sysI /usr/include/SDL -I . -I ./netlogic -I ./maclib -I ./screenlib --prefixstrip ./ -o deps.ps .

	[ a rather clean dependency graph ]

wesnoth-0.9.1$ graph-includes -v --class wesnoth --group 1-1 -sysI /usr/include/c++/3.3 -sysI /usr/include -sysI /usr/include/SDL --prefixstrip src/ -I src -o deps.ps src/

	[ more work has to be put in the wesnoth example class,
	  especially since the graph-includes-0.7 layout change ]


Examples only here as a reminder to write proper project classes for them:

qemu-0.7.0$ graph-includes -v -sysI /usr/include/ -sysI /usr/include/SDL $(find -name CVS -prune -o -type d -printf '-I %p\n') -o deps.ps .

	[ needs supporting features for multi-arch source trees ]

mesag-6.2.1$ graph-includes -o -sysI /usr/include -I ./include -I ./include/GL -I ./src/mesa -I ./src/mesa/main -I ./src/glu/sgi/include -I ./src/glu/sgi/libnurbs/internals -I ./src/mesa/glapi -o deps.ps .

	[ needs proper file-grouping ]


CAVEATS
-------

- this script only handles explicitely-declared dependencies, it
  won't detect it if eg. a prototype cut'n'paste was used instead of
  using the correct #include, but you shouldn't do that anyway :)


RELATED TOOLS
-------------

I finally found a couple of tools out there, from which I may borrow
ideas some day.  I'd be happy to hear about more of them.

- cinclude2dot, originally from Darxus
(http://www.chaosreigns.com/code/cinclude2dot/), then taken over by
F. Flourish (http://www.flourish.org/cinclude2dot/) is a GPL
C/C++-only tool, which apparently has support for grouping, but not
for transitive reduction.  Should I have searched better, and found it
a couple of months ago, maybe graph-includes would have never been
developped :)

- http://www.tarind.com/depgraph.html has a dependency grapher for
python, without transitive reduction as well.  It does however allow
customisation of project classes, somewhat similar to graph-includes.

- OptimalAdvisor (http://javacentral.compuware.com/pasta/) is a
refactoring tool, which goes far beyond simple dependency analysis,
but is non-free/libre/open-source (also they have a
functionally-limited free/gratis edition) and seems to support only
java.

- codeproject.com has some VisualStudio(tm) plugins targetting C++,
which I cannot test, but appear to scale badly for large projects
(http://www.codeproject.com/csharp/DependencyGraph.asp).


TODO
----

- core engine
  - continue merging the verbose/debug behaviour into the global report file.
  - change case of class names when the API gets stabilized
  - allow to associate attributes to files (eg. an ARCH attribute for
    multi-architecture trees, like kernels, development tools and emulators)
  - modularization (finish the restructuring into a cleaner and more modular design)
    + rework the recording of edges to make them apply to files, not to graph nodes,
      since more advanced features will need more flexibility
    - allow passing options to modules (-O param=value ?)
    - graph output syntax (allow to generate tulip graphs)
    - separate styling from project classes
    - allow to define several views in a project-class, several of which
      can be generated by default.
    - find out whether we can declare protocols/pure-virtual-classes in
      some way, to cleanup the class graph
    - generalize --prefix-strip
  - give consistent access to all commonly-needed features through
    command-line and class customization
- graph-includes tool
  + find the accessory classes as easily as possible (like bugzilla ?)
  - better robustness to incorrect arguments (eg. --group 1:2)
  - automate --help production (see Pod::Usage ?)
  + multi-sheet paper support may be broken
  - use an existing source of paper formats (libpaper, LC_PAPER, whatever)
  - maybe use graphviz' tred(1) to check our transitive reductions.
  - some autodetection of the language to use based on filenames ?
  - provide an initial list of system directories to avoid repeating them (ask compiler)
- other tools
  - provide an interactive tool to help understanding a project's
    structure.  Maybe with graphviz' lefty, or as a specialized tulip
    gui ?
- extractors
  + allow -I syntax for programs using eg. -I. from source subdirectory
  + behave as expected wrt leading "./", use File::Spec for more portability
  - consider using Cwd::realpath or so, for correct "../" handling
  - write other extractors (java, python, ...)
  - C-like extractor
    - some support for CPP symbol conditionals (mostly #ifdef), perhaps coupling
      this with attributes
    - write an openc++-based dependency extractor
      - extract more fine-grained dependency (depending on a header does
	not necessarily imply depending on code)
      - handle (warn about) the case where the declarations for a given
	implementation file are scattered in more than one header
  - detect undeclared dependencies (eg. manually inserted prototypes)
  - check necessity of declared includes
  - perl extractor
   - improve the perl extractor
- project classes
  - proper way to define include paths in project class
  - make default project-class consider multiple levels of directories
    as group levels, but only if they (consistently ?) have multiple
    subgroups ?
  - write a linux-kernel class and others as examples :)
  - provide a simple hash-based filelabel implementation
  - provide tools for automatic grouping (eg. using cycles, or
    selected external deps, or from leaves)
- presentation
  + allow coloring other things than just level 2
  - generalize the special_edge() mechanism (use a hash of edge attributes ?)
  - allow different node shapes when mixing high-level nodes with
    lower-level ones through the default singleton groups
    (special_node mechanism similar to the special_edge one ?)
  + optionally show labels (using attributes ?) or count for files
    (subnodes) in a node and color arcs according to them
  - optionally show external deps (deps on files not on command-line)
  - limit graph to one or more given group(s) of files (specified by <level>:<label>)
  - draw cycles in a given color
  - draw a specific path
  - allow setting fg color for a specific group level
  - provide automatic coloring schemes
  - color intra-group edges with the same color as nodes (post-processing ?)
  - allow to request drawing of who in a high-level node points to
    another node (ie. violates some constraint)
  - propagate excuses in some way when they are dropped by the transitive reducer
  - investigate candidate tools for hyperbolic layout ?
- documentation
 - write more documentation
 - lift the doc to docbook
- testsuite
 - write a testsuite.
 - ensure that all provided non-abstract classes are self-contained


KNOWN BUGS
----------

  - --showdropped mode draws too many edges as dropped (ie. does not
    consider marked edges as dropped when deciding whether to consider
    subsequent edges as dropped)
  - when showing only 3-3, colors from level 2 get propagated to level-3 groups
  - transitive reduction may not be complete, some more edges could
    possibly be dropped - wesnoth tree at 2005-03-25 exhibits the problem
    with the "display -> builder -> animated -> image" path


LICENSE
-------

    Copyright (c) 2005 Yann Dirson <ydirson@altern.org>

    This program is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License, version 2,
    as published by the Free Software Foundation.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.
