match marker patch

View: New views
2 Messages — Rating Filter:   Alert me  

match marker patch

by Andrew Friedley :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I've attached a patch that implements the marker support:

<named_location>
    <name>output_cc</name>
    <match marker="begin">cc -o</match>
    <match marker="end">ld</match>
</named_location>

A few notes..

The way I implemented it, the match lines themselves are not included in
the value data.  Not a big deal to me whether they are or not.  Perhaps
we want some sort of option?

If multiple matches with marker="begin" are specified for one named
location, any of them will match.  Likewise, any marker="end" will
terminate the input parsing.  This means that the marker begin/ends are
not paired at all - it's even possible to have a different number of
begins and ends.

Tests are included as well.

Andrew

Index: test/marker/input/input_1.dat
===================================================================
--- test/marker/input/input_1.dat (revision 0)
+++ test/marker/input/input_1.dat (revision 0)
@@ -0,0 +1,8 @@
+# a fixed number of lines after the keyword
+
+RESULT_BEGIN
+line 1
+line 2
+line 3
+line 4
+RESULT_END
Index: test/marker/input/input_2.dat
===================================================================
--- test/marker/input/input_2.dat (revision 0)
+++ test/marker/input/input_2.dat (revision 0)
@@ -0,0 +1,9 @@
+# a fixed number of lines after the keyword, with a blank line in the middle
+
+RESULT_BEGIN
+line 1
+line 2
+
+line 3
+line 4
+RESULT_END
Index: test/marker/input/input_3.dat
===================================================================
--- test/marker/input/input_3.dat (revision 0)
+++ test/marker/input/input_3.dat (revision 0)
@@ -0,0 +1,10 @@
+# automatically determine number of lines after the keyword
+# be sure to stop at the end marker
+
+RESULT_BEGIN
+line 1
+line 2
+line 3
+RESULT_END
+line 4
+line 5
Index: test/marker/input/input_4.dat
===================================================================
--- test/marker/input/input_4.dat (revision 0)
+++ test/marker/input/input_4.dat (revision 0)
@@ -0,0 +1,9 @@
+# too much data for a string
+
+RESULT_BEGIN
+data data data data data data data data data data data data data data data data data data data data data
+data data data data data data data data data data data data data data data data data data data data data
+data data data data data data data data data data data data data data data data data data data data data
+data data data data data data data data data data data data data data data data data data data data data
+data data data data data data data data data data data data data data data data data data data data data
+RESULT_END
Index: test/marker/input/input_5.dat
===================================================================
--- test/marker/input/input_5.dat (revision 0)
+++ test/marker/input/input_5.dat (revision 0)
@@ -0,0 +1,9 @@
+# a text datatype can store variable-length data
+
+RESULT_BEGIN
+data data data data data data data data data data data data data data data data data data data data data
+data data data data data data data data data data data data data data data data data data data data data
+data data data data data data data data data data data data data data data data data data data data data
+data data data data data data data data data data data data data data data data data data data data data
+data data data data data data data data data data data data data data data data data data data data data
+RESULT_END
Index: test/marker/verify/test_10.vfy
===================================================================
--- test/marker/verify/test_10.vfy (revision 0)
+++ test/marker/verify/test_10.vfy (revision 0)
@@ -0,0 +1,7 @@
+# text[]
+data data data data data data data data data data data data data data data data data data data data data
+data data data data data data data data data data data data data data data data data data data data data
+data data data data data data data data data data data data data data data data data data data data data
+data data data data data data data data data data data data data data data data data data data data data
+data data data data data data data data data data data data data data data data data data data data data
+
Index: test/marker/verify/test_1.vfy
===================================================================
Index: test/marker/verify/test_2.vfy
===================================================================
Index: test/marker/verify/test_3.vfy
===================================================================
Index: test/marker/verify/test_4.vfy
===================================================================
--- test/marker/verify/test_4.vfy (revision 0)
+++ test/marker/verify/test_4.vfy (revision 0)
@@ -0,0 +1,8 @@
+ERROR:  value too long for type character varying(256)
+
+INSERT INTO rundata_4 (string) VALUES ('data data data data data data data data data data data data data data data data data data data data data
+data data data data data data data data data data data data data data data data data data data data data
+data data data data data data data data data data data data data data data data data data data data data
+data data data data data data data data data data data data data data data data data data data data data
+data data data data data data data data data data data data data data data data data data data data data
+')
Index: test/marker/verify/test_5.vfy
===================================================================
Index: test/marker/verify/test_6.vfy
===================================================================
--- test/marker/verify/test_6.vfy (revision 0)
+++ test/marker/verify/test_6.vfy (revision 0)
@@ -0,0 +1,6 @@
+# string[]
+line 1
+line 2
+line 3
+line 4
+
Index: test/marker/verify/test_7.vfy
===================================================================
--- test/marker/verify/test_7.vfy (revision 0)
+++ test/marker/verify/test_7.vfy (revision 0)
@@ -0,0 +1,7 @@
+# string[]
+line 1
+line 2
+
+line 3
+line 4
+
Index: test/marker/verify/test_8.vfy
===================================================================
--- test/marker/verify/test_8.vfy (revision 0)
+++ test/marker/verify/test_8.vfy (revision 0)
@@ -0,0 +1,5 @@
+# string[]
+line 1
+line 2
+line 3
+
Index: test/marker/verify/test_9.vfy
===================================================================
--- test/marker/verify/test_9.vfy (revision 0)
+++ test/marker/verify/test_9.vfy (revision 0)
@@ -0,0 +1 @@
+#* WARNING: <run> 'r.runid' lists non-existing run <index> '4' (ignored)
Index: test/marker/runtest
===================================================================
--- test/marker/runtest (revision 0)
+++ test/marker/runtest (revision 0)
@@ -0,0 +1,112 @@
+#!/bin/sh
+
+# debug runmode gives unwanted output
+unset PB_RUNMODE
+
+if [ -z "$PB" ] ; then
+    PB=../../bin/perfbase
+fi
+TEST_NAME="marker"
+
+expected_rc=( 0 0 0 1 0 0 0 0 0 0)
+run_ids=( 1 2 3 4 5 )
+N_RUN=${#run_ids[@]}
+N_TEST=${#expected_rc[@]}
+
+if [ "$1" = "-c" ] ; then
+    VERIFY=""    
+    echo "*** re-creating verification data for '${TEST_NAME}'"
+else
+    VERIFY="yes"
+    echo "*** running $N_TEST tests in '${TEST_NAME}'"
+fi
+if [ "$1" = "-d" ] ; then
+    export PB_RUNMODE="debug"
+fi
+
+rm -f *.out *.log
+
+# setup
+if ! $PB setup -d ${TEST_NAME}_exp.xml >>test.log 2>&1 ; then
+    echo "#* ERROR in ${TEST_NAME}: could not setup experiment. See test.log"
+    exit 1
+fi
+
+# import data - already a test
+rc=0
+i=0
+while [ $i -lt $N_RUN ] ; do
+    t=`expr $i + 1`
+    echo "  test "$t
+    INP_ARGS="-u -d ${TEST_NAME}_inp_${t}.xml input/input_${t}.dat"
+    CMD="$PB input $INP_ARGS"
+    $CMD > test_$t.out    
+    if [ $? != ${expected_rc[$i]} ] ; then
+ echo "#* ERROR in ${TEST_NAME}: unexpected return code for input operation:"
+ echo "   $CMD"
+ echo "   (see file test_$t.out)"
+ rc=`expr $rc + 1`
+    else
+ if [ -n "$VERIFY" ] ; then
+    if ! diff test_$t.out verify/test_$t.vfy ; then
+ echo "#* ERROR in ${TEST_NAME}: input $t had wrong ouput "
+ echo "   $CMD"
+ echo "   (file test_$t.out vs. verify/test_$t.vfy)"
+ rc=`expr $rc + 1`
+    else
+ rm -f test_$t.out
+    fi
+ else
+    mv test_$t.out verify/test_$t.vfy
+ fi
+    fi
+    i=`expr $i + 1`
+done
+
+# do the queries to verify that the correct data was imported
+i=0
+while [ $i -lt $N_RUN ] ; do
+    t=`expr $i + 1 + $N_RUN`
+    echo "  test "$t
+
+    ii=`expr $i + $N_RUN`
+    QUERY_ARGS="-d ${TEST_NAME}_qry_1.xml -f f.runid=${run_ids[$i]}"
+    $PB query  $QUERY_ARGS > test_$t.out
+    if [ $? != ${expected_rc[$ii]} ] ; then
+ echo "#* ERROR in ${TEST_NAME}: could not perform query $t:"
+ echo "   $PB query $QUERY_ARGS"
+ echo "   (see file test_$t.out)"
+ rc=`expr $rc + 1`
+    else
+ if [ -n "$VERIFY" ] ; then
+    if ! diff test_$t.out verify/test_$t.vfy ; then
+ echo "#* ERROR in ${TEST_NAME}: query $t returned wrong data "
+ echo "   $PB query $QUERY_ARGS"
+ echo "   (file test_$t.out vs. verify/test_$t.vfy)"
+ rc=`expr $rc + 1`
+    else
+ rm -f test_$t.out
+    fi
+ else
+    mv test_$t.out verify/test_$t.vfy
+ fi
+    fi
+    i=`expr $i + 1`
+done
+
+# make sure the database server will be able to delete the database
+sleep 2
+
+# clean up
+if ! $PB delete -e ${TEST_NAME}_TEST --dontask  >>test.log 2>&1; then
+    echo "#* ERROR in ${TEST_NAME}: could not delete experiment. See test.log"
+    exit 1
+fi
+
+if [ $rc = 0 ] ; then
+    rm -f test.log *.err
+fi
+
+echo "*** done with tests in '${TEST_NAME}' ($rc failed)"
+
+exit $rc

Property changes on: test/marker/runtest
___________________________________________________________________
Name: svn:executable
   + *

Index: test/marker/marker_inp_1.xml
===================================================================
--- test/marker/marker_inp_1.xml (revision 0)
+++ test/marker/marker_inp_1.xml (revision 0)
@@ -0,0 +1,12 @@
+<?xml version="1.0" standalone="no"?>
+<!DOCTYPE input SYSTEM "../../dtd/pb_input.dtd">
+
+<input id="marker_test_1">
+  <experiment>marker_TEST</experiment>
+
+  <named_location mode="set">
+    <name>string</name>
+    <match marker="begin">RESULT_BEGIN</match>
+    <match marker="end">RESULT_END</match>
+  </named_location>
+</input>
Index: test/marker/marker_inp_2.xml
===================================================================
--- test/marker/marker_inp_2.xml (revision 0)
+++ test/marker/marker_inp_2.xml (revision 0)
@@ -0,0 +1,12 @@
+<?xml version="1.0" standalone="no"?>
+<!DOCTYPE input SYSTEM "../../dtd/pb_input.dtd">
+
+<input id="marker_test_2">
+  <experiment>marker_TEST</experiment>
+
+  <named_location mode="set">
+    <name>string</name>
+    <match marker="begin">RESULT_BEGIN</match>
+    <match marker="end">RESULT_END</match>
+  </named_location>
+</input>
Index: test/marker/marker_inp_3.xml
===================================================================
--- test/marker/marker_inp_3.xml (revision 0)
+++ test/marker/marker_inp_3.xml (revision 0)
@@ -0,0 +1,12 @@
+<?xml version="1.0" standalone="no"?>
+<!DOCTYPE input SYSTEM "../../dtd/pb_input.dtd">
+
+<input id="marker_test_3">
+  <experiment>marker_TEST</experiment>
+
+  <named_location mode="set">
+    <name>string</name>
+    <match marker="begin">RESULT_BEGIN</match>
+    <match marker="end">RESULT_END</match>
+  </named_location>
+</input>
Index: test/marker/marker_inp_4.xml
===================================================================
--- test/marker/marker_inp_4.xml (revision 0)
+++ test/marker/marker_inp_4.xml (revision 0)
@@ -0,0 +1,12 @@
+<?xml version="1.0" standalone="no"?>
+<!DOCTYPE input SYSTEM "../../dtd/pb_input.dtd">
+
+<input id="marker_test_4">
+  <experiment>marker_TEST</experiment>
+
+  <named_location mode="set">
+    <name>string</name>
+    <match marker="begin">RESULT_BEGIN</match>
+    <match marker="end">RESULT_END</match>
+  </named_location>
+</input>
Index: test/marker/marker_inp_5.xml
===================================================================
--- test/marker/marker_inp_5.xml (revision 0)
+++ test/marker/marker_inp_5.xml (revision 0)
@@ -0,0 +1,12 @@
+<?xml version="1.0" standalone="no"?>
+<!DOCTYPE input SYSTEM "../../dtd/pb_input.dtd">
+
+<input id="marker_test_5">
+  <experiment>marker_TEST</experiment>
+
+  <named_location mode="set">
+    <name>text</name>
+    <match marker="begin">RESULT_BEGIN</match>
+    <match marker="end">RESULT_END</match>
+  </named_location>
+</input>
Index: test/marker/marker_qry_1.xml
===================================================================
--- test/marker/marker_qry_1.xml (revision 0)
+++ test/marker/marker_qry_1.xml (revision 0)
@@ -0,0 +1,37 @@
+<?xml version="1.0" standalone="no"?>
+<!DOCTYPE query SYSTEM "../../dtd/pb_query.dtd">
+
+<query id="show_string">
+  <experiment>marker_TEST</experiment>
+  <description>
+    Print the imported data to verify it is correct
+  </description>
+
+  <fixed id="f.runid">
+    <content>0</content>
+  </fixed>
+
+  <run id="r.runid">
+    <index>f.runid</index>
+  </run>
+
+  <source id="src.string">
+    <result>string</result>
+    
+    <input>r.runid</input>
+  </source>
+
+  <source id="src.text">
+    <result>text</result>
+    
+    <input>r.runid</input>
+  </source>
+
+  <output>
+    <input>src.string</input>
+    <input>src.text</input>
+  </output>
+  
+</query>
+
+
Index: test/marker/marker_exp.xml
===================================================================
--- test/marker/marker_exp.xml (revision 0)
+++ test/marker/marker_exp.xml (revision 0)
@@ -0,0 +1,46 @@
+<?xml version="1.0" standalone="no"?>
+<!DOCTYPE experiment SYSTEM "../../dtd/pb_experiment.dtd">
+
+<experiment>
+  <name>marker_TEST</name>
+  <info>
+    <performed_by>
+      <name>Joachim Worringen</name>
+      <organization>Computer and Communications Research Lab, NEC Europe Ltd.</organization>
+    </performed_by>
+    <project>perfbase test suite</project>
+    <synopsis>Test marker attribute of match in named_location.</synopsis>
+    <description>
+      Using the marker attribute, it is possible to import one or more lines
+      after a matching string, and before another matching string.  Raw text
+      is imported into a vlaue of type "string" or "text".
+    </description>
+  </info>
+
+  <result>
+    <name>string</name>
+    <datatype>string</datatype>
+
+    <synopsis>result string</synopsis>
+    <default></default>
+  </result>
+  
+  <result>
+    <name>int</name>
+    <datatype>integer</datatype>
+
+    <synopsis>result integer</synopsis>
+    <default></default>
+  </result>
+  
+  <result>
+    <name>text</name>
+    <datatype>text</datatype>
+
+    <synopsis>result text</synopsis>
+    <default></default>
+  </result>
+  
+</experiment>
+
+
Index: test/marker/Makefile
===================================================================
--- test/marker/Makefile (revision 0)
+++ test/marker/Makefile (revision 0)
@@ -0,0 +1,16 @@
+# perform the tests
+
+test: testing
+
+testing:
+ @./runtest
+
+create:
+ @./runtest -c
+
+clean:
+ @rm -f *~ *.log *.out
+
+dbclean: clean
+ perfbase delete --exp=marker_TEST --dontask --force
+
Index: test/Makefile
===================================================================
--- test/Makefile (revision 58)
+++ test/Makefile (working copy)
@@ -1,7 +1,7 @@
 # run the perfbase test suite
 
 TESTDIRS=fixed runs derived eval plain_table plottype sweep update \
- errors count filename map missing boolean regexp \
+ errors count filename map marker missing boolean regexp \
  input oponesrc optwosrc distrib quote exist default \
  order pset split attach filter slice runindex combiner \
  lines limit combiner2
Index: bin/pb_input.py
===================================================================
--- bin/pb_input.py (revision 58)
+++ bin/pb_input.py (working copy)
@@ -686,6 +686,8 @@
         self.content = None
         self.ws = whitespace
         self.match = []
+        self.match_begin = []
+        self.match_end = []
         self.regexp = []
         self.trigger = []
         self.valid_vals = None
@@ -862,13 +864,25 @@
 
         self.terminator = { None : None }
         for m in r.findall('match'):
-            self.match.append(m.text)
-            self.terminator[m.text] = m.get("terminator")
+            m_att = m.get('marker')
+            if m_att:
+                if m_att in ("begin"):
+                    self.match_begin.append(m.text)
+                elif m_att in ("end"):
+                    self.match_end.append(m.text)
+                else:
+                    raise SpecificationError, "Invalid content '%s' for marker attribute of <match> '%s' in <named_location> '%s'" \
+                          % (m_att, m.text, self.name)
+            else:
+                self.match.append(m.text)
+                self.terminator[m.text] = m.get("terminator")
+
         for m in r.findall('regexp'):
             # any exception to catch?
             regexp = re.compile(m.text)
             self.regexp.append(regexp)
             self.terminator[regexp] = None
+
         ws = r.findtext('ws')
         if ws:
             self.ws = ws + self.ws
@@ -876,7 +890,7 @@
         if att and not cmp(lower(att), 'yes'):
             self.is_separator = True
 
-        if len(self.match) + len(self.regexp) + len(self.trigger) == 0:
+        if len(self.match) + len(self.match_begin) + len(self.regexp) + len(self.trigger) == 0:
             if self.valid_vals is None:
                 raise SpecificationError, "No <match>, <regexp> or <trigger> provided for <named_location> '%s'." \
                       % self.name
@@ -904,6 +918,7 @@
             return "remove"
 
         self.data_match = None
+        self.data_match_begin = None
         self.data_re = None
         rval = "nothing"
 
@@ -933,6 +948,14 @@
                     rval = "parse"
                     break
 
+            for m in self.match_begin:
+                if lines[idx].find(m) != -1:
+                    if pb_debug:
+                        print "#* checking <named_location>: found '%s'" % m
+                    self.data_match_begin = m;
+                    rval = "parse"
+                    break
+
             for r in self.regexp:
                 if r.search(lines[idx]):
                     self.data_re = r
@@ -1000,6 +1023,28 @@
             else:
                 return "store_value"
 
+        if self.data_match_begin:
+            # read until a string in match_end is set
+            #self.content = lines[idx].split(self.data_match_begin, 2)[1]
+            self.content = ""
+            i = idx + 1;
+            try:
+                while strip(lines[i]) not in self.match_end:
+                    self.content += lines[i]
+                    i += 1
+            except IndexError:
+                pass;
+
+            if pb_profiling:
+                t0 = time.clock() - t0
+                self.prof_data['parse_data'].append(t0)
+
+            self.parse_cnt += 1
+            if self.is_set:
+                return "store_set"
+            else:
+                return "store_value"
+
         if self.data_match:
             if pb_debug:
                 print "#* parsing <named_location> '%s'" % (self.data_match, )
Index: dtd/pb_input.dtd
===================================================================
--- dtd/pb_input.dtd (revision 58)
+++ dtd/pb_input.dtd (working copy)
@@ -147,6 +147,7 @@
 <!ELEMENT description  (#PCDATA)>
 <!ELEMENT match        (#PCDATA)>
 <!ATTLIST match        match        (exact|fuzzy)   "fuzzy"
+                       marker       (begin|end)     "begin"
                        content      (front|behind)  "behind"
                        terminator   CDATA           #IMPLIED>
 <!ELEMENT attachment   (#PCDATA)>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...

Re: match marker patch

by Joachim Worringen :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Andrew Friedley:
> I've attached a patch that implements the marker support:

Hi Andrew,

thanks for the patch. We should move this discussion to the developer-list
<dev@...>. For your convenience, I already subscribed you (the
list had no traffic so far).

  Joachim

--
Joachim Worringen - NEC C&C research lab St.Augustin
fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...