|
View:
New views
4 Messages
—
Rating Filter:
Alert me
|
|
|
open issues for ns-3 testingWe last left off the testing discussion [1] prior to ns-3.5 release,
when we discussed the relative merits of running the tests from within or outside of waf. However, there are still several issues that need to be addressed before we could move towards a merge of a new testing framework. I tried to collect and summarize below what I understand to be many of the remaining issues, and maybe Craig and others can respond or add to the below list, and propose their requirements or goals. Here is some prototype code that Craig has been putting together (it is in separate repos): http://code.nsnam.org/craigdo/ns-3-test-patches/file/e6a504a6568e/test.patch http://code.nsnam.org/craigdo/ns-3-valver/ To recap, we already have a unit test framework, and a regression testing framework for trace-based comparison of our examples. We don't have a good place yet for validation and verification tests, we don't have good examples to point to for how to write more complicated tests (including stochastic tests), and further, we would like to not lean too heavily on simply comparing pcap trace files for regression or validation purposes, so we need a better way for generating, saving, and comparing input and output test vectors that are more specific than packet trace dumps. Here are some lingering issues. 1) what is the overall plan for where this test code lives, and how does it relate to existing test.h and test.cc? Should existing src/test be moved to src/core? 2) Why not just borrow one of the existing GPLed testing frameworks and reuse it here? Which ones have we looked at for comparison? 3) How are non-trivial output test vectors (they could be trace files, long sequences of numbers or timestamps, etc.) going to be stored? 4) What is the story for waf integration, if test.py is retained? In particular, what kind of front end could waf provide for test.py script? In a previous post, I suggested that basic modes of test.py could still be invoked by ./waf --check, ./waf --test, ./waf --regression (etc.). Are the issues of accessing the configuration cache and of running programs through waf (or not) settled? 5) Related to this, what is the expected API and granularity for running these tests? Do we have different levels of testing that have simple APIs (such as the above sequence of waf commands)? 6) Can we review some really simple examples (e.g. in samples/ directory) showing different examples of what different types of tests would look like? What about a TCP model verification example? 7) Where do more complicated test and validation programs (i.e. the actual tests) reside? src/test? 8) What about Mathieu's comment recently about having a mode that dumps a lot of trace locally (without downloading saved regression traces) for more extensive regression testing before a checkin? 9) What output do people want to see, typically (Gustavo had made a comment about minimal output for successful tests and verbose outputs for failed tests)? 10) Can we avoid baking in a GSL dependency in case we want to rewrite histogram or chi-squared routines ourselves in the future? 11) Mathieu remarked early on that it would be nice to factor out the chi-squared testing tools from within the random number tests, so they could be used elsewhere. 12) How would lcov, nightly builds (buildbots, others) etc. be plumbed in? - Tom [1] http://mailman.isi.edu/pipermail/ns-developers/2009-May/005884.html |
|
|
Re: open issues for ns-3 testing Hi,
On Thursday 09 July 2009 09:15:06 am Tom Henderson wrote: > I tried to collect and summarize below what I understand to be many of > the remaining issues, and maybe Craig and others can respond or add to > the below list, and propose their requirements or goals. Let me add few words. First of all, I believe that the model validation is a research activity which can't be automated or restricted to use some build-in "Validation toolkit". IMHO the best thing we can do is to encourage a companion "The NS-3 Foo Model Validation" paper(s) for every foo module. The recent 802.11b PHY model is an excellent example of this, see http://www.nsnam.org/~pei/80211b.pdf M.b. we should create and maintain a wiki page listing all available models together with references to their validation reports (see e.g. http://www.scalable-networks.com/pdf/QualNet_Library.pdf for a nice table of all available models). When model is validated, usual regression tests (both unit and functional) can be used to assert that it is not broken. To summarize -- I propose do not create any kind of validation toolkit, rather than encourage and collect references to the validation efforts. > we need a better way for generating, saving, and > comparing input and output test vectors that are more specific than > packet trace dumps. Agree. > 2) Why not just borrow one of the existing GPLed testing frameworks and > reuse it here? Which ones have we looked at for comparison? Existing test.h + waf --check unit test facility just fits my needs and is not worse than boost::test + cmake/ctest . > 7) Where do more complicated test and validation programs (i.e. the > actual tests) reside? src/test? Why not? > 9) What output do people want to see, typically (Gustavo had made a > comment about minimal output for successful tests and verbose outputs > for failed tests)? Personally I'd like to see (finally) usual green-red html table like http://www.cdash.org/CDash/index.php?project=Slicer3#Coverage or http://buildbot.net/trac/wiki/ScreenShots with optional cpu time / memory usage / stdout / stderr available by click. > 10) Can we avoid baking in a GSL dependency in case we want to rewrite > histogram or chi-squared routines ourselves in the future? Why not interface with dedicated statistical systems like R (http://www.rproject.org/) for data analysis? I don't believe that we really need to create one more homegrown histogram implementation, to say nothing about hypothesis testing. Regards, Pavel Boyko, IITP |
|
|
Re: open issues for ns-3 testingPavel, thanks for the feedback-- some responses below...
Pavel Boyko wrote: > Hi, > > On Thursday 09 July 2009 09:15:06 am Tom Henderson wrote: >> I tried to collect and summarize below what I understand to be many of >> the remaining issues, and maybe Craig and others can respond or add to >> the below list, and propose their requirements or goals. > > Let me add few words. > > First of all, I believe that the model validation is a research activity > which can't be automated or restricted to use some build-in "Validation > toolkit". IMHO the best thing we can do is to encourage a companion "The NS-3 > Foo Model Validation" paper(s) for every foo module. The recent 802.11b PHY > model is an excellent example of this, see > http://www.nsnam.org/~pei/80211b.pdf I agree-- I think we need all of these things: 1) a place to list all models 2) some guidelines and examples on how to write validation tests and documentation about them 3) a place to store validation scripts and documents that show how they have been validated, if at all 4) some way to put them through regression testing, if they are suitable M.b. we should create and maintain a wiki > page listing all available models together with references to their validation > reports (see e.g. http://www.scalable-networks.com/pdf/QualNet_Library.pdf for > a nice table of all available models). That is a nice page; presently, we have been trying to build this list through the doxygen so that it is maintained inline with the code: http://www.nsnam.org/doxygen/modules.html but admittedly you need to click through some of those items to really see what is available, and it doesn't link to validation. Maybe we could maintain a nicely formatted wiki page, as you suggest, that lists models and links to the relevant validation material for it. When model is validated, usual > regression tests (both unit and functional) can be used to assert that it is > not broken. To summarize -- I propose do not create any kind of validation > toolkit, rather than encourage and collect references to the validation > efforts. It may be that some validation programs are not easily suited to be run as regression tests, but ideally it would be nice to reuse them that way. What I would like to produce, though, are some good examples of how to do various types of validation and regression testing so others can contribute similar things. >> 10) Can we avoid baking in a GSL dependency in case we want to rewrite >> histogram or chi-squared routines ourselves in the future? > > Why not interface with dedicated statistical systems like R > (http://www.rproject.org/) for data analysis? I don't believe that we really > need to create one more homegrown histogram implementation, to say nothing > about hypothesis testing. > I personally don't want to rewrite those things, but we have to date avoided making most aspects of ns-3 core dependent on external libraries, so the only reason I see for writing a histogram routine would be to avoid having to install and link something like GSL. But, your point about r-project.org seems to reinforce my point that maybe it will not be GSL that we use in the future for this statistical work, and that we could maybe decouple the tests from explicit GSL calls. Tom |
|
|
Re: open issues for ns-3 testingSince I only got one comment from Pavel on the below, so I'd like to try
to move forward on it. I'd like to try to get this feature included in ns-3.6 so it would be nice to try to define a first mergeable chunk. Some suggested next steps are inline below: Tom Henderson wrote: > We last left off the testing discussion [1] prior to ns-3.5 release, > when we discussed the relative merits of running the tests from within > or outside of waf. However, there are still several issues that need to > be addressed before we could move towards a merge of a new testing > framework. > > I tried to collect and summarize below what I understand to be many of > the remaining issues, and maybe Craig and others can respond or add to > the below list, and propose their requirements or goals. > > Here is some prototype code that Craig has been putting together (it is > in separate repos): > http://code.nsnam.org/craigdo/ns-3-test-patches/file/e6a504a6568e/test.patch > > http://code.nsnam.org/craigdo/ns-3-valver/ > > To recap, we already have a unit test framework, and a regression > testing framework for trace-based comparison of our examples. We don't > have a good place yet for validation and verification tests, we don't > have good examples to point to for how to write more complicated tests > (including stochastic tests), and further, we would like to not lean too > heavily on simply comparing pcap trace files for regression or > validation purposes, so we need a better way for generating, saving, and > comparing input and output test vectors that are more specific than > packet trace dumps. > > Here are some lingering issues. > > 1) what is the overall plan for where this test code lives, and how does > it relate to existing test.h and test.cc? Should existing src/test be > moved to src/core? I think a first step might be for Craig to produce a new test.h and test.cc that merges what we already have with the new test code. This test.cc/h could be moved to an underlying src/test that src/core depends on, or could be in src/core/test.cc/h. I don't care strongly; Craig suggested that it might be nice to be in a separate module to enforce that src/test does not depend on src/core. Part of my feeling lukewarm about the path src/test/test.{cc,h} is that I think that src/test would be a nice directory name to include the test programs. > > 2) Why not just borrow one of the existing GPLed testing frameworks and > reuse it here? Which ones have we looked at for comparison? I think Craig can comment on what he looked at-- I don't know. In talking to Craig, he said that the other frameworks all cover unit test macros and such, but not more advanced features we may likely need, plus there is the issue of coding style. Maybe Craig can say more here if there is more to say. > > 3) How are non-trivial output test vectors (they could be trace files, > long sequences of numbers or timestamps, etc.) going to be stored? I think Craig needs to produce/describe an example of this. > > 4) What is the story for waf integration, if test.py is retained? In > particular, what kind of front end could waf provide for test.py script? > In a previous post, I suggested that basic modes of test.py could still > be invoked by ./waf --check, ./waf --test, ./waf --regression (etc.). > Are the issues of accessing the configuration cache and of running > programs through waf (or not) settled? I would support a waf frontend to test.py for default cases, such as "make check" and "make test" in the make world. I suggest to not make any changes to the ./waf api as a first step. > > 5) Related to this, what is the expected API and granularity for running > these tests? Do we have different levels of testing that have simple > APIs (such as the above sequence of waf commands)? I think we need to have different levels of testing because some tests are going to be quick and some slow to execute. I don't have a specific proposal for what belongs where, other than keeping existing tests where they are now. > > 6) Can we review some really simple examples (e.g. in samples/ > directory) showing different examples of what different types of tests > would look like? What about a TCP model verification example? > > 7) Where do more complicated test and validation programs (i.e. the > actual tests) reside? src/test? This is related to the 1) above. I think that src/test could be a home for test programs that are not intended to be used as examples (e.g. the tcp-nsc-zoo, wifi-cmu-clear-channel, David Evensky's test programs), but test programs could also live in the modules where the code lives, such as our unit tests. > > 8) What about Mathieu's comment recently about having a mode that dumps > a lot of trace locally (without downloading saved regression traces) for > more extensive regression testing before a checkin? I would suggest to consider this as a second step, but I agree that this would be useful mode of testing. > > 9) What output do people want to see, typically (Gustavo had made a > comment about minimal output for successful tests and verbose outputs > for failed tests)? I would suggest to consider nicely formatted output as a second step. > > 10) Can we avoid baking in a GSL dependency in case we want to rewrite > histogram or chi-squared routines ourselves in the future? It seems to me that we are going to have to add a GSL dependency for some tests to execute. I don't know whether there is any appropriate statistical abstraction API that would allow R-project, GSL, ROOT, (pick your favorite library) to be easily swapped in. Does R API resemble GSL API? So, I would suggest that for now we just include gsl explicitly for ease of use until/unless someone comes with an abstraction or alternate implementation that they would like to support, and we can consider to refactor at that time. > > 11) Mathieu remarked early on that it would be nice to factor out the > chi-squared testing tools from within the random number tests, so they > could be used elsewhere. Agree-- I think that this should be done as a second step. > > 12) How would lcov, nightly builds (buildbots, others) etc. be plumbed in? This can be a second step. Anyway, if there are no other comments, I suggest that Craig try to follow the above and come back to the list when he has another proposal for consideration. Tom |
| Free embeddable forum powered by Nabble | Forum Help |