|
View:
New views
1 Messages
—
Rating Filter:
Alert me
|
|
|
Document instrumentation / metrics (long) was Re: Re: The nature of executive "pushback" to agile technologies?Al Chou wrote....
> Jim, the instrumentation of documentation usage is an interesting approach. Can you give more > detail about how you do that? Ok, I will try.... Documentation of any software system is a hard problem. This is due to a number of factors; * documents serves multiple stakeholders, all with different views * documents are in a continuous state of flux, freshness and staleness * documents can have inconsistent locations, revision systems, naming conventions, etc When I use the word 'documents', I am mainly talking about project level artifacts e.g. the artifacts generated during the lifetime of a project. Some examples of these are; * specifications: TDD, architecture diagrams, * project docs: status, project schedule, tasks, gaant charts...whatever * actual software documentation: usage information, tutorials, developer guides, etc * business logic definition * code reports: from LOCC to junit, coverage tests * data models: entity definitions, relationship diagrams * other stuff: presentations / meeting minutes, etc... and so on...I am not talking about code level documentation (javadoc, etc)...though they may flow into other higher level documentation. All of the above can contain mixed content (text, pictures, graphs, etc) and be generated regularly or kept up to date. Some can contain things like UML, metadata (which may drive some autogenerated code process). All of the above can live in various different formats; html, MS Word, impenetrable pdf, emacs text file and so on. The goal when instrumenting up documents is to ascertain the following statistics with which to make decisions. Document References: how much is a particular document/section being referenced by other documents. Document Usage: define whom is using what document, when and why, whom is contributing, etc Document Volatility: how much is the document changing with time, activity, freshness, staleness Now onto the 'how'.....dont expect some software package in the form of a super wiki that does all of the above...this is hard work to both put in place and maintain (gets harder at scale as well). note: It would be mad to implement such a system for smaller scale projects.... -------------------- Rules for my System -------------------- here are the basic rules of the doc systems I create; * Each document is given a consistent concrete URI in the form of URL ex. http://localhost/doc/datamodel/entity1/ this points to a RDDL document (http://www.rddl.org/, now I am investigating GRDDL) and has links to the most current version of the document. You can get RESTful if u would like from this url with respect to accessing revisions, date-time etc. I personally am not a fan of RDF, but I am an XSLT person (co-author EXSLT)...the initial reason for using RDDL was to relate a lot of things 'bag' style and have a summery page...now I am doing everything RESTful, so may not be an appropriate approach these days. In any event summery page important! * provide usage data in the form of 'source control usage' and 'basic webserver style' data (on hits to urls, viewa s, think google analytics at worst!) figures u can even put it at the end of that url, once again http://localhost/doc/datamodel/entity1/usage , if u do this right you can fold up things as well e.g. http://localhost/doc/datamodel/usage and so forth. Look at cenque fisheye, was a great source of ideas on how to present this type of data IMHO...its important to get a simple table of document usage from this * decide on how you want to measure volatility e.g. I recc starting off with baselining things like words/pages/lines/para/sections/chars/images/graphs, number of commiters. When you have a homogenous documentation system, e.g. everyone is generating html one can even start applying HTML/XML similarity and difference analysis * enable feedback forms, look at microsoft and ibm at the bottom of every article they have they have a useful rate this article, provide feedback etc....once again you can get RESTful if u like http://localhost/doc/datamodel/entity1/rateit. Think Digg as an analogy. Such anecdotal feedback provides a talking point for regular document reviews * search through documents for references (it is possible now to go through pdf, ms word, html all in one, but I get better results saving in xml OFfice formats), this is your document reference metric * dont forget about normalising document usage by your entire population of users, which should be the number of users who can access the url's; its useful to know the silent majority I get a nice flexible document system using apache combined with mod_rewrite, mod_dav with subversion and some sort of RESTful approach (with summery page ). I use perl to achieve the summery pages, searching and whatever else. I try to come up with a single number that represents the metrics, here are some suggested starting values which should have some sort of relation to time (avg daily, avg monthly, total) usage: # page views volatility: (# svn commits / time) (% change in words/pages/lines/para/sections/chars/images/graphs) references: # of references from all other documents as a hint it was easy to create a firefox xul toolbar to sort of harmoise all this functionality into a simple to use thing in the browser....instead of having to build some enterprise wiki/portal -------------------- Rules for my Users -------------------- Yes all the metrics can be gamed, but its easy enough to see when people are doing this ...in addition some ground rules for users have to be set out. * authors, contributors, committers and editors worked via source control...no document was 'real' or available until it was checked into source control. * reviewers, users consumed documents from url's .....no more emailing documents, emailing url's is fine * authors, editors, contributors, commiters, users need to use URL when identifying document for references to work. This can be hard when working in lots of different formats. be realistic as to the granularity of applying this approach...you might feel the need to start wanting to know the statistics of every little image, graph, text file....its really not worth it. --------------------- Summary --------------------- introducing the idea of 'capped' document generation was a bit of a revelation for me. I have been in innumerable meeting where the outcome was the generation of some 'new' form of artifact. Having statistics to guide you means that your efforts become more directed. Another point to make is that it was typical that useful artifacts themselves continued to provide more data over time. Enforcing the same naming convention and access mechanisms had a noticeable effect on production...things were easy to find and consistently named. In a situation where there is a finite amt of resources, the document usage/volatility/reference metrics starts assisting where to apply those resources. You still have the problem of documents with multiple stake-holders, but at least you can start getting reliable stats on which group is more important or if there is evidence that a separate set of documents should be written. Some examples of interpretation are; * A document with low volatility and high usage and references is probably done and a keeper * A document with high volatility, low references and low usage either needs to be removed, rethought or refocused * A document with low everything needs to go...I have found that these documents become 'red herrings', simply wasting peoples time looking for information * A document with high volatility and high usage might indicate the need to break out In those situtations where an author or commiter neglected to use references then they will see their own performance go down...so there is an incentive there (once again everything can be abused as well). I have had to stop managers from trying to use such a system as a management tool....I did once hear 'is there a problem with this person, his documents have the lowest avg usage metric'.....all the caveats apply. ok, there is no magic here and I perhaps do a lot of work were more clever folk would say 'just use this software!'...but I think any effort towards instrumenting up documentation pays off. hth, Jim Fuller |
| Free embeddable forum powered by Nabble | Forum Help |