open issues for ns-3 testing

View: New views
4 Messages — Rating Filter:   Alert me  

open issues for ns-3 testing

by Tom Henderson-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

We last left off the testing discussion [1] prior to ns-3.5 release,
when we discussed the relative merits of running the tests from within
or outside of waf.  However, there are still several issues that need to
be addressed before we could move towards a merge of a new testing
framework.

I tried to collect and summarize below what I understand to be many of
the remaining issues, and maybe Craig and others can respond or add to
the below list, and propose their requirements or goals.

Here is some prototype code that Craig has been putting together (it is
in separate repos):
http://code.nsnam.org/craigdo/ns-3-test-patches/file/e6a504a6568e/test.patch
http://code.nsnam.org/craigdo/ns-3-valver/

To recap, we already have a unit test framework, and a regression
testing framework for trace-based comparison of our examples.  We don't
have a good place yet for validation and verification tests, we don't
have good examples to point to for how to write more complicated tests
(including stochastic tests), and further, we would like to not lean too
heavily on simply comparing pcap trace files for regression or
validation purposes, so we need a better way for generating, saving, and
comparing input and output test vectors that are more specific than
packet trace dumps.

Here are some lingering issues.

1) what is the overall plan for where this test code lives, and how does
it relate to existing test.h and test.cc?  Should existing src/test be
moved to src/core?

2) Why not just borrow one of the existing GPLed testing frameworks and
reuse it here?  Which ones have we looked at for comparison?

3) How are non-trivial output test vectors (they could be trace files,
long sequences of numbers or timestamps, etc.) going to be stored?

4) What is the story for waf integration, if test.py is retained?  In
particular, what kind of front end could waf provide for test.py script?
  In a previous post, I suggested that basic modes of test.py could
still be invoked by ./waf  --check, ./waf --test, ./waf --regression
(etc.). Are the issues of accessing the configuration cache and of
running programs through waf (or not) settled?

5) Related to this, what is the expected API and granularity for running
these tests?  Do we have different levels of testing that have simple
APIs (such as the above sequence of waf commands)?

6) Can we review some really simple examples (e.g. in samples/
directory) showing different examples of what different types of tests
would look like?  What about a TCP model verification example?

7) Where do more complicated test and validation programs (i.e. the
actual tests) reside?  src/test?

8) What about Mathieu's comment recently about having a mode that dumps
a lot of trace locally (without downloading saved regression traces) for
more extensive regression testing before a checkin?

9) What output do people want to see, typically (Gustavo had made a
comment about minimal output for successful tests and verbose outputs
for failed tests)?

10) Can we avoid baking in a GSL dependency in case we want to rewrite
histogram or chi-squared routines ourselves in the future?

11) Mathieu remarked early on that it would be nice to factor out the
chi-squared testing tools from within the random number tests, so they
could be used elsewhere.

12) How would lcov, nightly builds (buildbots, others) etc. be plumbed in?

- Tom

[1] http://mailman.isi.edu/pipermail/ns-developers/2009-May/005884.html

Re: open issues for ns-3 testing

by Pavel Boyko :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

  Hi,

On Thursday 09 July 2009 09:15:06 am Tom Henderson wrote:
> I tried to collect and summarize below what I understand to be many of
> the remaining issues, and maybe Craig and others can respond or add to
> the below list, and propose their requirements or goals.

  Let me add few words.

  First of all, I believe that the model validation is a research activity
which can't be automated or restricted to use some build-in "Validation
toolkit". IMHO the best thing we can do is to encourage a companion "The NS-3
Foo Model Validation" paper(s) for every foo module. The recent 802.11b PHY
model is an excellent example of this, see  
http://www.nsnam.org/~pei/80211b.pdf M.b. we should create and maintain a wiki
page listing all available models together with references to their validation
reports (see e.g. http://www.scalable-networks.com/pdf/QualNet_Library.pdf for
a nice table of all available models). When model is validated, usual
regression tests (both unit and functional) can be used to assert that it is
not broken. To summarize -- I propose do not create any kind of validation
toolkit, rather than encourage and collect references to the validation
efforts.  

> we need a better way for generating, saving, and
> comparing input and output test vectors that are more specific than
> packet trace dumps.

  Agree.

> 2) Why not just borrow one of the existing GPLed testing frameworks and
> reuse it here?  Which ones have we looked at for comparison?

  Existing test.h + waf --check unit test facility just fits my needs and is
not worse than boost::test + cmake/ctest .

> 7) Where do more complicated test and validation programs (i.e. the
> actual tests) reside?  src/test?

  Why not?

> 9) What output do people want to see, typically (Gustavo had made a
> comment about minimal output for successful tests and verbose outputs
> for failed tests)?

  Personally I'd like to see (finally) usual green-red html table like
http://www.cdash.org/CDash/index.php?project=Slicer3#Coverage or
http://buildbot.net/trac/wiki/ScreenShots with optional cpu time / memory
usage / stdout / stderr available by click.  

> 10) Can we avoid baking in a GSL dependency in case we want to rewrite
> histogram or chi-squared routines ourselves in the future?

  Why not interface with dedicated statistical systems like R
(http://www.rproject.org/) for data analysis? I don't believe that we really
need to create one more homegrown histogram implementation, to say nothing
about hypothesis testing.

  Regards,
  Pavel Boyko, IITP

Re: open issues for ns-3 testing

by Tom Henderson-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Pavel, thanks for the feedback-- some responses below...

Pavel Boyko wrote:

>   Hi,
>
> On Thursday 09 July 2009 09:15:06 am Tom Henderson wrote:
>> I tried to collect and summarize below what I understand to be many of
>> the remaining issues, and maybe Craig and others can respond or add to
>> the below list, and propose their requirements or goals.
>
>   Let me add few words.
>
>   First of all, I believe that the model validation is a research activity
> which can't be automated or restricted to use some build-in "Validation
> toolkit". IMHO the best thing we can do is to encourage a companion "The NS-3
> Foo Model Validation" paper(s) for every foo module. The recent 802.11b PHY
> model is an excellent example of this, see  
> http://www.nsnam.org/~pei/80211b.pdf 

I agree-- I think we need all of these things:
1) a place to list all models
2) some guidelines and examples on how to write validation tests and
documentation about them
3) a place to store validation scripts and documents that show how they
have been validated, if at all
4) some way to put them through regression testing, if they are suitable

M.b. we should create and maintain a wiki
> page listing all available models together with references to their validation
> reports (see e.g. http://www.scalable-networks.com/pdf/QualNet_Library.pdf for
> a nice table of all available models).

That is a nice page; presently, we have been trying to build this list
through the doxygen so that it is maintained inline with the code:
http://www.nsnam.org/doxygen/modules.html
but admittedly you need to click through some of those items to really
see what is available, and it doesn't link to validation.

Maybe we could maintain a nicely formatted wiki page, as you suggest,
that lists models and links to the relevant validation material for it.

When model is validated, usual
> regression tests (both unit and functional) can be used to assert that it is
> not broken. To summarize -- I propose do not create any kind of validation
> toolkit, rather than encourage and collect references to the validation
> efforts.  

It may be that some validation programs are not easily suited to be run
as regression tests, but ideally it would be nice to reuse them that way.

What I would like to produce, though, are some good examples of how to
do various types of validation and regression testing so others can
contribute similar things.

>> 10) Can we avoid baking in a GSL dependency in case we want to rewrite
>> histogram or chi-squared routines ourselves in the future?
>
>   Why not interface with dedicated statistical systems like R
> (http://www.rproject.org/) for data analysis? I don't believe that we really
> need to create one more homegrown histogram implementation, to say nothing
> about hypothesis testing.
>

I personally don't want to rewrite those things, but we have to date
avoided making most aspects of ns-3 core dependent on external
libraries, so the only reason I see for writing a histogram routine
would be to avoid having to install and link something like GSL.  But,
your point about r-project.org seems to reinforce my point that maybe it
will not be GSL that we use in the future for this statistical work, and
that we could maybe decouple the tests from explicit GSL calls.

Tom

Re: open issues for ns-3 testing

by Tom Henderson-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Since I only got one comment from Pavel on the below, so I'd like to try
to move forward on it.

I'd like to try to get this feature included in ns-3.6 so it would be
nice to try to define a first mergeable chunk.

Some suggested next steps are inline below:

Tom Henderson wrote:

> We last left off the testing discussion [1] prior to ns-3.5 release,
> when we discussed the relative merits of running the tests from within
> or outside of waf.  However, there are still several issues that need to
> be addressed before we could move towards a merge of a new testing
> framework.
>
> I tried to collect and summarize below what I understand to be many of
> the remaining issues, and maybe Craig and others can respond or add to
> the below list, and propose their requirements or goals.
>
> Here is some prototype code that Craig has been putting together (it is
> in separate repos):
> http://code.nsnam.org/craigdo/ns-3-test-patches/file/e6a504a6568e/test.patch 
>
> http://code.nsnam.org/craigdo/ns-3-valver/
>
> To recap, we already have a unit test framework, and a regression
> testing framework for trace-based comparison of our examples.  We don't
> have a good place yet for validation and verification tests, we don't
> have good examples to point to for how to write more complicated tests
> (including stochastic tests), and further, we would like to not lean too
> heavily on simply comparing pcap trace files for regression or
> validation purposes, so we need a better way for generating, saving, and
> comparing input and output test vectors that are more specific than
> packet trace dumps.
>
> Here are some lingering issues.
>
> 1) what is the overall plan for where this test code lives, and how does
> it relate to existing test.h and test.cc?  Should existing src/test be
> moved to src/core?

I think a first step might be for Craig to produce a new test.h and
test.cc that merges what we already have with the new test code.

This test.cc/h could be moved to an underlying src/test that src/core
depends on, or could be in src/core/test.cc/h.  I don't care strongly;
Craig suggested that it might be nice to be in a separate module to
enforce that src/test does not depend on src/core.  Part of my feeling
lukewarm about the path src/test/test.{cc,h} is that I think that
src/test would be a nice directory name to include the test programs.

>
> 2) Why not just borrow one of the existing GPLed testing frameworks and
> reuse it here?  Which ones have we looked at for comparison?

I think Craig can comment on what he looked at-- I don't know.  In
talking to Craig, he said that the other frameworks all cover unit test
macros and such, but not more advanced features we may likely need, plus
there is the issue of coding style.  Maybe Craig can say more here if
there is more to say.

>
> 3) How are non-trivial output test vectors (they could be trace files,
> long sequences of numbers or timestamps, etc.) going to be stored?

I think Craig needs to produce/describe an example of this.

>
> 4) What is the story for waf integration, if test.py is retained?  In
> particular, what kind of front end could waf provide for test.py script?
>  In a previous post, I suggested that basic modes of test.py could still
> be invoked by ./waf  --check, ./waf --test, ./waf --regression (etc.).
> Are the issues of accessing the configuration cache and of running
> programs through waf (or not) settled?

I would support a waf frontend to test.py for default cases, such as
"make check" and "make test" in the make world.  I suggest to not make
any changes to the ./waf api as a first step.

>
> 5) Related to this, what is the expected API and granularity for running
> these tests?  Do we have different levels of testing that have simple
> APIs (such as the above sequence of waf commands)?

I think we need to have different levels of testing because some tests
are going to be quick and some slow to execute.  I don't have a specific
proposal for what belongs where, other than keeping existing tests where
they are now.

>
> 6) Can we review some really simple examples (e.g. in samples/
> directory) showing different examples of what different types of tests
> would look like?  What about a TCP model verification example?
>
> 7) Where do more complicated test and validation programs (i.e. the
> actual tests) reside?  src/test?

This is related to the 1) above.  I think that src/test could be a home
for test programs that are not intended to be used as examples (e.g. the
tcp-nsc-zoo, wifi-cmu-clear-channel, David Evensky's test programs), but
test programs could also live in the modules where the code lives, such
as our unit tests.

>
> 8) What about Mathieu's comment recently about having a mode that dumps
> a lot of trace locally (without downloading saved regression traces) for
> more extensive regression testing before a checkin?

I would suggest to consider this as a second step, but I agree that this
  would be useful mode of testing.

>
> 9) What output do people want to see, typically (Gustavo had made a
> comment about minimal output for successful tests and verbose outputs
> for failed tests)?

I would suggest to consider nicely formatted output as a second step.

>
> 10) Can we avoid baking in a GSL dependency in case we want to rewrite
> histogram or chi-squared routines ourselves in the future?

It seems to me that we are going to have to add a GSL dependency for
some tests to execute.  I don't know whether there is any appropriate
statistical abstraction API that would allow R-project, GSL, ROOT, (pick
your favorite library) to be easily swapped in.  Does R API resemble GSL
API?

So, I would suggest that for now we just include gsl explicitly for ease
of use until/unless someone comes with an abstraction or alternate
implementation that they would like to support, and we can consider to
refactor at that time.

>
> 11) Mathieu remarked early on that it would be nice to factor out the
> chi-squared testing tools from within the random number tests, so they
> could be used elsewhere.

Agree-- I think that this should be done as a second step.

>
> 12) How would lcov, nightly builds (buildbots, others) etc. be plumbed in?

This can be a second step.

Anyway, if there are no other comments, I suggest that Craig try to
follow the above and come back to the list when he has another proposal
for consideration.

Tom