Bug#549227: UDD: please collect and expose the load time for update scripts

View: New views
4 Messages — Rating Filter:   Alert me  

Bug#549227: UDD: please collect and expose the load time for update scripts

by Sandro Tosi-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Package: qa.debian.org
Severity: wishlist
User: qa.debian.org@...
Usertags: udd

Hi!
It's common in a datawarehouse system (like UDD can be considered) to keep track
of the update jobs times: start, end, duration, records elaborated and so on.

This will allow to query such information to generate a report ob jobs
executions like: durations (mean, stddev, etc), growth, performance, eventual
tuning due to interaction with other scripts, and so no.

Such information, are usually stored in a different (internal) schema than the
main one, but I think we can just add a table in 'udd' (maybe prefixed with
'udd_' to claryfy it's a UDD interal information table) for such information.

Thanks,
Sandro

-- System Information:
Debian Release: squeeze/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.30-1-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash



--
To UNSUBSCRIBE, email to debian-bugs-dist-REQUEST@...
with a subject of "unsubscribe". Trouble? Contact listmaster@...


Bug#549227: UDD: please collect and expose the load time for update scripts

by Lucas Nussbaum :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 01/10/09 at 20:08 +0200, Sandro Tosi wrote:

> Package: qa.debian.org
> Severity: wishlist
> User: qa.debian.org@...
> Usertags: udd
>
> Hi!
> It's common in a datawarehouse system (like UDD can be considered) to keep track
> of the update jobs times: start, end, duration, records elaborated and so on.
>
> This will allow to query such information to generate a report ob jobs
> executions like: durations (mean, stddev, etc), growth, performance, eventual
> tuning due to interaction with other scripts, and so no.
>
> Such information, are usually stored in a different (internal) schema than the
> main one, but I think we can just add a table in 'udd' (maybe prefixed with
> 'udd_' to claryfy it's a UDD interal information table) for such information.

Hi Sandro,

Timestamps are now exported to http://udd.debian.org/timing.txt , but
this doesn't keep historical information.

A patch adding the table you describe would be appreciated (the code
would have to be python)
--
| Lucas Nussbaum
| lucas@...   http://www.lucas-nussbaum.net/ |
| jabber: lucas@...             GPG: 1024D/023B3F4F |



--
To UNSUBSCRIBE, email to debian-bugs-dist-REQUEST@...
with a subject of "unsubscribe". Trouble? Contact listmaster@...


Bug#549227: UDD: please collect and expose the load time for update scripts

by Serafeim Zanikolas :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi guys,

On Wed, Oct 07, 2009 at 09:47:11PM +0200, Lucas Nussbaum wrote [edited]:
> On 01/10/09 at 20:08 +0200, Sandro Tosi wrote:
> > It's common in a datawarehouse system (like UDD can be considered) to keep track
> > of the update jobs times: start, end, duration, records elaborated and so on.
[..]
> A patch adding the table you describe would be appreciated (the code
> would have to be python)

Patch attached. I didn't add a duration column as it's trivially calculated on
the fly. I'm open to suggestions about getting record counts before and after
updates in a generic way.

Cheers,
Serafeim

ps. hacking UDD would be more fun without mixed indentation ;)

--
debtags-organised WNPP bugs: http://members.hellug.gr/serzan/wnpp


Index: udd.py
===================================================================
--- udd.py (revision 1612)
+++ udd.py (working copy)
@@ -8,7 +8,7 @@
 import string
 import sys
 from os import system
-from time import asctime
+import time
 import udd.aux
 import os.path
 
@@ -20,6 +20,23 @@
   for cmd in available_commands:
     print '  %s' % cmd
 
+def insert_timestamps(config, source, command, start_time, end_time):
+    connection = udd.aux.open_connection(config)
+    cur = connection.cursor()
+    values = { 'source' : source,
+               'command' : command,
+               'start_time' : start_time,
+               'end_time' : end_time }
+    cur.execute("""INSERT INTO udd_timestamps
+                            (source, command, start_time, end_time)
+                     VALUES (%(source)s, %(command)s, %(start_time)s,
+                             %(end_time)s)
+                """, values)
+    connection.commit()
+
+def get_timestamp():
+    return time.strftime('%Y-%m-%d %H:%M:%S')
+
 if __name__ == '__main__':
   if len(sys.argv) < 4:
     print_help()
@@ -46,25 +63,13 @@
       # can just use the gatherer's methods
       if command == 'update':
  if "update-command" in src_config:
-  if 'timestamp-dir' in config['general']:
-    f = open(os.path.join(config['general']['timestamp-dir'],
-                                  src+".update-start"), "w")
-    f.write(asctime())
-    f.close()
+          start_time = get_timestamp()
   result = system(src_config['update-command'])
   if result != 0:
     sys.exit(result)
-  if 'timestamp-dir' in config['general']:
-    f = open(os.path.join(config['general']['timestamp-dir'],
-                                  src+".update-end"), "w")
-    f.write(asctime())
-    f.close()
+        end_time = get_timestamp()
       else:
- if 'timestamp-dir' in config['general']:
-  f = open(os.path.join(config['general']['timestamp-dir'],
-                                src+".insert-start"), "w")
-  f.write(asctime())
-  f.close()
+ start_time = get_timestamp()
  (src_command,rest) = types[type].split(None, 1)
  if src_command == "exec":
   system(rest + " " + sys.argv[1] + " " + sys.argv[2] + " " + src)
@@ -83,11 +88,8 @@
   else:
     exec "gatherer.%s()" % command
   connection.commit()
- if 'timestamp-dir' in config['general']:
-  f = open(os.path.join(config['general']['timestamp-dir'],
-                                src+".insert-end"), "w")
-  f.write(asctime())
-  f.close()
+ end_time = get_timestamp()
+      insert_timestamps(config, src, command, start_time, end_time)
     except:
       udd.aux.unlock(config, src)
       raise
Index: sql/setup.sql
===================================================================
--- sql/setup.sql (revision 1612)
+++ sql/setup.sql (working copy)
@@ -535,6 +535,16 @@
 );
 GRANT SELECT ON wannabuild TO public;
 
+-- timings of data operations
+CREATE TABLE udd_timestamps (
+  id serial,
+  source text,
+  command text,
+  start_time timestamp,
+  end_time timestamp,
+  PRIMARY KEY (id)
+);
+GRANT SELECT ON udd_timestamps TO public;
 
 -- views
 -- bugs_count


Bug#549227: marked as done (UDD: please collect and expose the load time for update scripts)

by Debian Bug Tracking System :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Your message dated Tue, 3 Nov 2009 15:36:44 +0100
with message-id <20091103143644.GC3852@...>
and subject line Re: Bug#549227: UDD: please collect and expose the load time for update scripts
has caused the Debian Bug report #549227,
regarding UDD: please collect and expose the load time for update scripts
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@...
immediately.)


--
549227: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=549227
Debian Bug Tracking System
Contact owner@... with problems

Package: qa.debian.org
Severity: wishlist
User: qa.debian.org@...
Usertags: udd

Hi!
It's common in a datawarehouse system (like UDD can be considered) to keep track
of the update jobs times: start, end, duration, records elaborated and so on.

This will allow to query such information to generate a report ob jobs
executions like: durations (mean, stddev, etc), growth, performance, eventual
tuning due to interaction with other scripts, and so no.

Such information, are usually stored in a different (internal) schema than the
main one, but I think we can just add a table in 'udd' (maybe prefixed with
'udd_' to claryfy it's a UDD interal information table) for such information.

Thanks,
Sandro

-- System Information:
Debian Release: squeeze/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.30-1-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash



On 02/11/09 at 22:59 +0100, Serafeim Zanikolas wrote:

> Hi guys,
>
> On Wed, Oct 07, 2009 at 09:47:11PM +0200, Lucas Nussbaum wrote [edited]:
> > On 01/10/09 at 20:08 +0200, Sandro Tosi wrote:
> > > It's common in a datawarehouse system (like UDD can be considered) to keep track
> > > of the update jobs times: start, end, duration, records elaborated and so on.
> [..]
> > A patch adding the table you describe would be appreciated (the code
> > would have to be python)
>
> Patch attached. I didn't add a duration column as it's trivially calculated on
> the fly. I'm open to suggestions about getting record counts before and after
> updates in a generic way.
Thanks a lot, I've applied it (the table is named timestamps, not
udd_timestamps) and adapted the check_timestamp script that tell me when
data sources have not been updated for a long time.

> ps. hacking UDD would be more fun without mixed indentation ;)

I thought I had fixed all the files, but I missed udd.py. Fixed now.
--
| Lucas Nussbaum
| lucas@...   http://www.lucas-nussbaum.net/ |
| jabber: lucas@...             GPG: 1024D/023B3F4F |