« Return to Thread: Document instrumentation / metrics (long) was Re: Re: The nature of executive "pushback" to agile technologies?

Document instrumentation / metrics (long) was Re: Re: The nature of executive "pushback" to agile technologies?

by James Fuller-4 :: Rate this Message:

| View in Thread

Al Chou wrote....
> Jim, the instrumentation of documentation usage is an interesting approach.  Can you give more > detail about how you do that?

Ok, I will try....

Documentation of any software system is a hard problem. This is due to
a number of factors;

   * documents serves multiple stakeholders, all with different views

   * documents are in a continuous state of flux, freshness and staleness

   * documents can have inconsistent locations, revision systems,
naming conventions, etc

When I use the word 'documents', I am mainly talking about project
level artifacts e.g. the artifacts generated during the lifetime of a
project. Some examples of these are;

   * specifications: TDD, architecture diagrams,

   * project docs: status, project schedule, tasks, gaant charts...whatever

   * actual software documentation: usage information, tutorials,
developer guides, etc

   * business logic definition

   * code reports: from LOCC to junit, coverage tests

   * data models: entity definitions, relationship diagrams

   * other stuff: presentations / meeting minutes, etc...

and so on...I am not talking about code level documentation (javadoc,
etc)...though they may flow into other higher level documentation.

All of the above can contain mixed content (text, pictures, graphs,
etc) and be generated regularly or kept up to date. Some can contain
things like UML, metadata (which may drive some autogenerated code
process). All of the above can live in various different formats;
html, MS Word, impenetrable pdf, emacs text file and so on.

The goal when instrumenting up documents is to ascertain the following
statistics with which to make decisions.

Document References: how much is a particular document/section being
referenced by other documents.

Document Usage: define whom is using what document, when and why, whom
is contributing, etc

Document Volatility: how much is the document changing with time,
activity, freshness, staleness

Now onto the 'how'.....dont expect some software package in the form
of a super wiki that does all of the above...this is hard work to both
put in place and maintain (gets harder at scale as well).

note: It would be mad to implement such a system for smaller scale projects....

--------------------
Rules for my System
--------------------

here are the basic rules of the doc systems I create;

* Each document is given a consistent concrete URI in the form of URL
ex. http://localhost/doc/datamodel/entity1/ this points to a RDDL
document (http://www.rddl.org/, now I am investigating GRDDL) and has
links to the most current version of the document. You can get RESTful
if u would like from this url with respect to accessing revisions,
date-time etc. I personally am not a fan of RDF, but I am an XSLT
person (co-author EXSLT)...the initial reason for using RDDL was to
relate a lot of things 'bag' style and have a summery page...now I am
doing everything RESTful, so may not be an appropriate approach these
days. In any event summery page important!

* provide usage data in the form of 'source control usage' and 'basic
webserver style' data (on hits to urls, viewa s, think google
analytics at worst!) figures u can even put it at the end of that url,
once again http://localhost/doc/datamodel/entity1/usage , if u do this
right you can fold up things as well e.g.
http://localhost/doc/datamodel/usage and so forth. Look at cenque
fisheye, was a great source of ideas on how to present this type of
data IMHO...its important to get a simple table of document usage from
this

* decide on how you want to measure volatility e.g. I recc starting
off with baselining things like
words/pages/lines/para/sections/chars/images/graphs, number of
commiters. When you have a homogenous documentation system, e.g.
everyone is generating html one can even start applying HTML/XML
similarity and difference analysis

* enable feedback forms, look at microsoft and ibm at the bottom of
every article they have they have a useful rate this article, provide
feedback etc....once again you can get RESTful if u like
http://localhost/doc/datamodel/entity1/rateit. Think Digg as an
analogy. Such anecdotal feedback provides a talking point for regular
document reviews

* search through documents for references (it is possible now to go
through pdf, ms word, html all in one, but I get better results saving
in xml OFfice formats), this is your document reference metric

* dont forget about normalising document usage by your entire
population of users, which should be the number of users who can
access the url's; its useful to know the silent majority

I get a nice flexible document system using apache combined with
mod_rewrite, mod_dav with subversion and some sort of RESTful approach
(with summery page ).

I use perl to achieve the summery pages, searching and whatever else.

I try to come up with a single number that represents the metrics,
here are some suggested starting values which should have some sort of
relation to time (avg daily, avg monthly, total)

   usage: # page views

   volatility: (# svn commits / time) (% change in
words/pages/lines/para/sections/chars/images/graphs)

   references: # of references from all other documents

as a hint it was easy to create a firefox xul toolbar to sort of
harmoise all this functionality into a simple to use thing in the
browser....instead of having to build some enterprise wiki/portal

--------------------
Rules for my Users
--------------------

Yes all the metrics can be gamed, but its easy enough to see when
people are doing this ...in addition some ground rules for users have
to be set out.

* authors, contributors, committers and editors worked via source
control...no document was 'real' or available until it was checked
into source control.

* reviewers, users consumed documents from url's .....no more emailing
documents, emailing url's is fine

* authors, editors, contributors, commiters, users need to use URL
when identifying document for references to work. This can be hard
when working in lots of different formats.

be realistic as to the granularity of applying this approach...you
might feel the need to start wanting to know the statistics of every
little image, graph, text file....its really not worth it.

---------------------
Summary
---------------------

 introducing the idea of 'capped' document generation was a bit of a
revelation for me. I have been in innumerable meeting where the
outcome was the generation of some 'new' form of artifact. Having
statistics to guide you means that your efforts become more directed.

Another point to make is that it was typical that useful artifacts
themselves continued to provide more data over time. Enforcing the
same naming convention and access mechanisms had a noticeable effect
on production...things were easy to find and consistently named.

In a situation where there is a finite amt of resources, the document
usage/volatility/reference metrics starts assisting where to apply
those resources. You still have the problem of documents with multiple
stake-holders, but at least you can start getting reliable stats on
which group is more important or if there is evidence that a separate
set of documents should be written.

Some examples of interpretation are;

* A document with low volatility and high usage and references is
probably done and a keeper

* A document with high volatility, low references and low usage either
needs to be removed, rethought or refocused

* A document with low everything needs to go...I have found that these
documents become 'red herrings', simply wasting peoples time looking
for information

* A document with high volatility and high usage might indicate the
need to break out

In those situtations where an author or commiter neglected to use
references then they will see their own performance go down...so there
is an incentive there (once again everything can be abused as well).

I have had to stop managers from trying to use such a system as a
management tool....I did once hear 'is there a problem with this
person, his documents have the lowest avg usage metric'.....all the
caveats apply.

ok, there is no magic here and I perhaps do a lot of work were more
clever folk would say 'just use this software!'...but I think any
effort towards instrumenting up documentation pays off.

hth, Jim Fuller

 « Return to Thread: Document instrumentation / metrics (long) was Re: Re: The nature of executive "pushback" to agile technologies?