Logfile Manipulation

View: New views
16 Messages — Rating Filter:   Alert me  

Logfile Manipulation

by Stephen Nelson-Smith :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I've got a large amount of data in the form of 3 apache and 3 varnish
logfiles from 3 different machines.  They are rotated at 0400.  The
logfiles are pretty big - maybe 6G per server, uncompressed.

I've got to produce a combined logfile for 0000-2359 for a given day,
with a bit of filtering (removing lines based on text match, bit of
substitution).

I've inherited a nasty shell script that does this but it is very slow
and not clean to read or understand.

I'd like to reimplement this in python.

Initial questions:

* How does Python compare in performance to shell, awk etc in a big
pipeline?  The shell script kills the CPU
* What's the best way to extract the data for a given time, eg 0000 -
2359 yesterday?

Any advice or experiences?

S.
--
Stephen Nelson-Smith
Technical Director
Atalanta Systems Ltd
www.atalanta-systems.com
_______________________________________________
Tutor maillist  -  Tutor@...
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: Logfile Manipulation

by Alan Gauld :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

"Stephen Nelson-Smith" <sanelson@...> wrote

> * How does Python compare in performance to shell, awk etc in a big
> pipeline?  The shell script kills the CPU

Python should be significantly faster than the typical shell script
and it should consume less resources, although it will probably
still use a fair bit of CPU unless you nice it.

> * What's the best way to extract the data for a given time, eg 0000 -
> 2359 yesterday?

I'm not familiar with Apache log files so I'll let somebody else answer,
but I suspect you can either use string.split() or a re.findall(). You
might
even be able to use csv. Or if they are in XML you could use ElementTree.
It all depends on the data!

--
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/ 


_______________________________________________
Tutor maillist  -  Tutor@...
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: Logfile Manipulation

by Stephen Nelson-Smith :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Nov 9, 2009 at 8:47 AM, Alan Gauld <alan.gauld@...> wrote:

> I'm not familiar with Apache log files so I'll let somebody else answer,
> but I suspect you can either use string.split() or a re.findall(). You might
> even be able to use csv. Or if they are in XML you could use ElementTree.
> It all depends on the data!

An apache logfile entry looks like this:

89.151.119.196 - - [04/Nov/2009:04:02:10 +0000] "GET
/service.php?s=nav&arg[]=&arg[]=home&q=ubercrumb/node%2F20812
HTTP/1.1" 200 50 "-" "-"

I want to extract 24 hrs of data based timestamps like this:

[04/Nov/2009:04:02:10 +0000]

I also need to do some filtering (eg I actually don't want anything
with service.php), and I also have to do some substitutions - that's
trivial other than not knowing the optimum place to do it?  IE should
I do multiple passes?  Or should I try to do all the work at once,
only viewing each line once?  Also what about reading from compressed
files?  The data comes in as 6 gzipped logfiles which expand to 6G in
total.

S.
_______________________________________________
Tutor maillist  -  Tutor@...
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: Logfile Manipulation

by Alan Gauld :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.

> An apache logfile entry looks like this:
>
>89.151.119.196 - - [04/Nov/2009:04:02:10 +0000] "GET
> /service.php?s=nav&arg[]=&arg[]=home&q=ubercrumb/node%2F20812
> HTTP/1.1" 200 50 "-" "-"
>
>I want to extract 24 hrs of data based timestamps like this:
>
> [04/Nov/2009:04:02:10 +0000]

OK It looks like you could use a regex to extract the first 
thing you find between square brackets. Then convert that to a time.

> I also need to do some filtering (eg I actually don't want anything
> with service.php), 

That's easy enough to detect.

> and I also have to do some substitutions - that's
> trivial other than not knowing the optimum place to do it?  

Assuming they are trivial then...

> I do multiple passes?  Or should I try to do all the work at once,

I'd opt for doing it all in one pass. With such large files you really 
want to minimise the amount of time spent reading the file. 
Plus with such large files you will need/want to process them 
line by line anyway rather than reading the whole thing into memory.

> Also what about reading from compressed files?  
> The data comes in as 6 gzipped logfiles 

Python has a module for that but I've never used it.

BTW A quick google reveals that there are several packages  
for handling Apache log files. That is probably worth investigating 
before you start writing lots of code...

Examples:

Scratchy - The Apache Log Parser and HTML Report Generator for Python

Scratchy is an Apache Web Server log parser and HTML report generator written in Python. Scratchy was created by Phil Schwartz ...
scratchy.sourceforge.net/ - Cached - Similar - 

Loghetti: an apache log file filter in Python - O'Reilly ONLamp Blog

18 Mar 2008 ... Loghetti: an apache log file filter in Python ... This causes loghetti to parsethe query string, and return lines where the query parameter ...
www.oreillynet.com/.../blog/.../loghetti_an_apache_log_file_fi.html -

HTH

Alan G.

_______________________________________________
Tutor maillist  -  Tutor@...
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Parent Message unknown Re: Logfile Manipulation

by Stephen Nelson-Smith :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Sorry - forgot to include the list.

On Mon, Nov 9, 2009 at 9:33 AM, Stephen Nelson-Smith <sanelson@...> wrote:

> On Mon, Nov 9, 2009 at 9:10 AM, ALAN GAULD <alan.gauld@...> wrote:
>>
>>> An apache logfile entry looks like this:
>>>
>>>89.151.119.196 - - [04/Nov/2009:04:02:10 +0000] "GET
>>> /service.php?s=nav&arg[]=&arg[]=home&q=ubercrumb/node%2F20812
>>> HTTP/1.1" 200 50 "-" "-"
>>>
>>>I want to extract 24 hrs of data based timestamps like this:
>>>
>>> [04/Nov/2009:04:02:10 +0000]
>>
>> OK It looks like you could use a regex to extract the first
>> thing you find between square brackets. Then convert that to a time.
>
> I'm currently thinking I can just use a string comparison after the
> first entry for the day - that saves date arithmetic.
>
>> I'd opt for doing it all in one pass. With such large files you really
>> want to minimise the amount of time spent reading the file.
>> Plus with such large files you will need/want to process them
>> line by line anyway rather than reading the whole thing into memory.
>
> How do I handle concurrency?  I have 6 log files which I need to turn
> into one time-sequenced log.
>
> I guess I need to switch between each log depending on whether the
> next entry is the next chronological entry between all six.  Then on a
> per line basis I can also reject it if it matches the stuff I want to
> throw out, and substitute it if I need to, then write out to the new
> file.
>
> S.
>



--
Stephen Nelson-Smith
Technical Director
Atalanta Systems Ltd
www.atalanta-systems.com
_______________________________________________
Tutor maillist  -  Tutor@...
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: Logfile Manipulation

by Martin A. Brown-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

 : An apache logfile entry looks like this:
 :
 : 89.151.119.196 - - [04/Nov/2009:04:02:10 +0000] "GET
 : /service.php?s=nav&arg[]=&arg[]=home&q=ubercrumb/node%2F20812
 : HTTP/1.1" 200 50 "-" "-"
 :
 : I want to extract 24 hrs of data based timestamps like this:
 :
 : [04/Nov/2009:04:02:10 +0000]
 :
 : I also need to do some filtering (eg I actually don't want
 : anything with service.php), and I also have to do some
 : substitutions - that's trivial other than not knowing the optimum
 : place to do it?  IE should I do multiple passes?

I wouldn't.  Then, you spend decompression CPU, line matching CPU
and I/O several times.  I'd do it all at once.


 : Or should I try to do all the work at once, only viewing each
 : line once?  Also what about reading from compressed files?  The
 : data comes in as 6 gzipped logfiles which expand to 6G in total.

There are standard modules for handling compressed data (gzip and
bz2).  I'd imagine that the other pythonistas on this list will give
you more detailed (and probably better) advice, but here's a sample
of how to use the gzip module and how to skip the lines containing
the '/service.php' string, and to extract an epoch timestamp from
the datestamp field(s).  You would pass the filenames to operate on
as arguments to this script.

  See optparse if you want fancier capabilities for option handling.
 
  See re if you want to match multiple patterns to ignore.
 
  See time (and datetime) for mangling time and date strings.  Be
    forewarned, time zone issues will probably be a massive headache.
    Many others have been here before [0].
 
  Look up itertools (and be prepared for some study) if you want the
    output from the log files from your different servers sorted in
    the output.

Note that the below snippet is a toy and makes no attempt to trap
(try/except) any error conditions.  

If you are looking for a weblog analytics package once you have
reassambled the files into a whole, perhaps you could just start
there (e.g. webalizer, analog are two old-school packages that come
to mind for processing logging that has been produced in a Common
Log Format).

I will echo Alan Gauld's sentiments of a few minutes ago and note
that there are a probably many different Apache log parsers out
there which can accomplish what you hope to accomplish.  On the
other hand, you may be using this as an excuse to learn a bit of
python.

Good luck,

- -Martin

 [0] http://seehuhn.de/blog/52

Sample:

  import sys, time, gzip
 
  files = sys.argv[1:]
 
  for file in files:
    print >>sys.stderr, "About to open %s" % ( file )
    f = gzip.open( file )
    for line in f:
      if line.find('/service.php') > 0:
        continue
      fields = line.split()
      # -- ignoring time zone; you are logging in UTC, right?
      #    tz = fields[4]
      d = int( time.mktime( time.strptime(fields[3], "[%d/%b/%Y:%H:%M:%S") ) )
      print d, line,


- --
Martin A. Brown
http://linux-ip.net/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (GNU/Linux)
Comment: pgf-0.72 (http://linux-ip.net/sw/pine-gpg-filter/)

iD8DBQFK9+MGHEoZD1iZ+YcRAhITAKCLGF6GnEMYr50bgk4vAw3YMRZjuACg2VUg
I7/Vrw6KKjwqfxG0qfr10lo=
=oi6X
-----END PGP SIGNATURE-----
_______________________________________________
Tutor maillist  -  Tutor@...
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: Logfile Manipulation

by Kent Johnson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Nov 9, 2009 at 4:36 AM, Stephen Nelson-Smith <sanelson@...> wrote:
>>>>I want to extract 24 hrs of data based timestamps like this:
>>>>
>>>> [04/Nov/2009:04:02:10 +0000]
>>>
>>> OK It looks like you could use a regex to extract the first
>>> thing you find between square brackets. Then convert that to a time.
>>
>> I'm currently thinking I can just use a string comparison after the
>> first entry for the day - that saves date arithmetic.

As long as the times are all in the same time zone.

>> How do I handle concurrency?  I have 6 log files which I need to turn
>> into one time-sequenced log.
>>
>> I guess I need to switch between each log depending on whether the
>> next entry is the next chronological entry between all six.  Then on a
>> per line basis I can also reject it if it matches the stuff I want to
>> throw out, and substitute it if I need to, then write out to the new
>> file.

If you create iterators from the files that yield (timestamp, entry)
pairs, you can merge the iterators using one of these recipes:
http://code.activestate.com/recipes/491285/
http://code.activestate.com/recipes/535160/

Kent
_______________________________________________
Tutor maillist  -  Tutor@...
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: Logfile Manipulation

by Wayne Werner :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



On Sun, Nov 8, 2009 at 11:41 PM, Stephen Nelson-Smith <sanelson@...> wrote:
I've got a large amount of data in the form of 3 apache and 3 varnish
logfiles from 3 different machines.  They are rotated at 0400.  The
logfiles are pretty big - maybe 6G per server, uncompressed.

I've got to produce a combined logfile for 0000-2359 for a given day,
with a bit of filtering (removing lines based on text match, bit of
substitution).

I've inherited a nasty shell script that does this but it is very slow
and not clean to read or understand.

I'd like to reimplement this in python.

Initial questions:

* How does Python compare in performance to shell, awk etc in a big
pipeline?  The shell script kills the CPU
* What's the best way to extract the data for a given time, eg 0000 -
2359 yesterday?

Any advice or experiences?


go here and download the pdf!

Someone posted this the other day, and I went and read through it and played around a bit and it's exactly what you're looking for - plus it has one vs. slide of python vs. awk.

I think you'll find the pdf highly useful and right on.

HTH,
Wayne

_______________________________________________
Tutor maillist  -  Tutor@...
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: Logfile Manipulation

by Stephen Nelson-Smith :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

>> Any advice or experiences?
>>
>
> go here and download the pdf!
> http://www.dabeaz.com/generators-uk/
> Someone posted this the other day, and I went and read through it and played
> around a bit and it's exactly what you're looking for - plus it has one vs.
> slide of python vs. awk.
> I think you'll find the pdf highly useful and right on.

Looks like generators are a really good fit.  My biggest question
really is how to multiplex.

I have 6 logs per day, so I don't know how which one will have the
next consecutive entry.

I love teh idea of making a big dictionary, but with 6G of data,
that's going to run me out of memory, isn't it

S.
_______________________________________________
Tutor maillist  -  Tutor@...
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: Logfile Manipulation

by Gerard Flanagan :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Stephen Nelson-Smith wrote:

> Hi,
>
>  
>>> Any advice or experiences?
>>>
>>>      
>> go here and download the pdf!
>> http://www.dabeaz.com/generators-uk/
>> Someone posted this the other day, and I went and read through it and played
>> around a bit and it's exactly what you're looking for - plus it has one vs.
>> slide of python vs. awk.
>> I think you'll find the pdf highly useful and right on.
>>    
>
> Looks like generators are a really good fit.  My biggest question
> really is how to multiplex.
>
> I have 6 logs per day, so I don't know how which one will have the
> next consecutive entry.
>
> I love teh idea of making a big dictionary, but with 6G of data,
> that's going to run me out of memory, isn't it
>
>  

Perhaps "lookahead" generators could help? Though that would be getting
into advanced territory:

   
http://stackoverflow.com/questions/1517862/using-lookahead-with-generators


_______________________________________________
Tutor maillist  -  Tutor@...
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: Logfile Manipulation

by Stephen Nelson-Smith :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

> If you create iterators from the files that yield (timestamp, entry)
> pairs, you can merge the iterators using one of these recipes:
> http://code.activestate.com/recipes/491285/
> http://code.activestate.com/recipes/535160/

Could you show me how I might do that?

So far I'm at the stage of being able to produce loglines:

#! /usr/bin/env python
import gzip
class LogFile:
  def __init__(self, filename, date):
   self.f=gzip.open(filename,"r")
   for logline in self.f:
     self.line=logline
     self.stamp=" ".join(self.line.split()[3:5])
     if self.stamp.startswith(date):
       break

  def getline(self):
    ret=self.line
    self.line=self.f.readline()
    self.stamp=" ".join(self.line.split()[3:5])
    return ret

logs=[LogFile("a/access_log-20091105.gz","[05/Nov/2009"),LogFile("b/access_log-20091105.gz","[05/Nov/2009"),LogFile("c/access_log-20091105.gz","[05/Nov/2009")]
while True:
  print [x.stamp for x in logs]
  nextline=min((x.stamp,x) for x in logs)
  print nextline[1].getline()


--
Stephen Nelson-Smith
Technical Director
Atalanta Systems Ltd
www.atalanta-systems.com
_______________________________________________
Tutor maillist  -  Tutor@...
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: Logfile Manipulation

by Stephen Nelson-Smith :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

And the problem I have with the below is that I've discovered that the
input logfiles aren't strictly ordered - ie there is variance by a
second or so in some of the entries.

I can sort the biggest logfile (800M) using unix sort in about 1.5
mins on my workstation.  That's not really fast enough, with
potentially 12 other files....

Hrm...

S.

On Mon, Nov 9, 2009 at 1:35 PM, Stephen Nelson-Smith <sanelson@...> wrote:

> Hi,
>
>> If you create iterators from the files that yield (timestamp, entry)
>> pairs, you can merge the iterators using one of these recipes:
>> http://code.activestate.com/recipes/491285/
>> http://code.activestate.com/recipes/535160/
>
> Could you show me how I might do that?
>
> So far I'm at the stage of being able to produce loglines:
>
> #! /usr/bin/env python
> import gzip
> class LogFile:
>  def __init__(self, filename, date):
>   self.f=gzip.open(filename,"r")
>   for logline in self.f:
>     self.line=logline
>     self.stamp=" ".join(self.line.split()[3:5])
>     if self.stamp.startswith(date):
>       break
>
>  def getline(self):
>    ret=self.line
>    self.line=self.f.readline()
>    self.stamp=" ".join(self.line.split()[3:5])
>    return ret
>
> logs=[LogFile("a/access_log-20091105.gz","[05/Nov/2009"),LogFile("b/access_log-20091105.gz","[05/Nov/2009"),LogFile("c/access_log-20091105.gz","[05/Nov/2009")]
> while True:
>  print [x.stamp for x in logs]
>  nextline=min((x.stamp,x) for x in logs)
>  print nextline[1].getline()
>
>
> --
> Stephen Nelson-Smith
> Technical Director
> Atalanta Systems Ltd
> www.atalanta-systems.com
>



--
Stephen Nelson-Smith
Technical Director
Atalanta Systems Ltd
www.atalanta-systems.com
_______________________________________________
Tutor maillist  -  Tutor@...
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: Logfile Manipulation

by Wayne Werner :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Nov 9, 2009 at 7:46 AM, Stephen Nelson-Smith <sanelson@...> wrote:
And the problem I have with the below is that I've discovered that the
input logfiles aren't strictly ordered - ie there is variance by a
second or so in some of the entries.

Within a given set of 10 lines, is the first line and last line "in order" - i.e.

1
2
4
3
5
8
7
6
9
10
 
I can sort the biggest logfile (800M) using unix sort in about 1.5
mins on my workstation.  That's not really fast enough, with
potentially 12 other files....

If that's the case, then I'm pretty sure you can create sort of a queue system, and it should probably cut down on the sorting time. I don't know what the default python sorting algorithm is on a list, but AFAIK you'd be looking at a constant O(log 10) time on each insertion by doing something such as this:


log_generator = (d for d in logdata)
mylist = # first ten values

while True:
    try:
        mylist.sort()
        nextdata = mylist.pop(0)
        mylist.append(log_generator.next())
    except StopIteration:
        print 'done'
   
    #Do something with nextdata
 
Or now that I look, python has a priority queue ( http://docs.python.org/library/heapq.html ) that you could use instead. Just push the next value into the queue and pop one out - you give it some initial qty - 10 or so, and then it will always give you the smallest value.

HTH,
Wayne

_______________________________________________
Tutor maillist  -  Tutor@...
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: Logfile Manipulation

by Alan Gauld :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.

> I can sort the biggest logfile (800M) using unix sort in about 1.5
> mins on my workstation.  That's not really fast enough, with
> potentially 12 other files....

You won't beat sort with Python.
You have to be realistic, these are very big files!

Python should be faster overall but for specific tasks the Unix 
tools written in C will be faster.

But if you are merging multiple files into one then sorting 
them before processing will probably help. However if you expect 
to be pruning out more lines than you keep it might be easier just 
to throw all the data you want into a single file and then sort that 
at the end. It all depends on the data.

HTH,

Alan G

_______________________________________________
Tutor maillist  -  Tutor@...
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: Logfile Manipulation

by Stephen Nelson-Smith :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Nov 9, 2009 at 3:15 PM, Wayne Werner <waynejwerner@...> wrote:
> On Mon, Nov 9, 2009 at 7:46 AM, Stephen Nelson-Smith <sanelson@...>
> wrote:
>>
>> And the problem I have with the below is that I've discovered that the
>> input logfiles aren't strictly ordered - ie there is variance by a
>> second or so in some of the entries.
>
> Within a given set of 10 lines, is the first line and last line "in order" -

On average, in a sequence of 10 log lines, one will be out by one or
two seconds.

Here's a random slice:

05/Nov/2009:01:41:37
05/Nov/2009:01:41:37
05/Nov/2009:01:41:37
05/Nov/2009:01:41:37
05/Nov/2009:01:41:36
05/Nov/2009:01:41:37
05/Nov/2009:01:41:37
05/Nov/2009:01:41:37
05/Nov/2009:01:41:37
05/Nov/2009:01:41:37
05/Nov/2009:01:41:37
05/Nov/2009:01:41:37
05/Nov/2009:01:41:36
05/Nov/2009:01:41:37
05/Nov/2009:01:41:37
05/Nov/2009:01:41:38
05/Nov/2009:01:41:38
05/Nov/2009:01:41:37
05/Nov/2009:01:41:38
05/Nov/2009:01:41:38
05/Nov/2009:01:41:38
05/Nov/2009:01:41:38
05/Nov/2009:01:41:37
05/Nov/2009:01:41:38
05/Nov/2009:01:41:36
05/Nov/2009:01:41:38
05/Nov/2009:01:41:38
05/Nov/2009:01:41:38
05/Nov/2009:01:41:38
05/Nov/2009:01:41:39
05/Nov/2009:01:41:38
05/Nov/2009:01:41:39
05/Nov/2009:01:41:39
05/Nov/2009:01:41:39
05/Nov/2009:01:41:39
05/Nov/2009:01:41:40
05/Nov/2009:01:41:40
05/Nov/2009:01:41:41
> I don't know
> what the default python sorting algorithm is on a list, but AFAIK you'd be
> looking at a constant O(log 10)

I'm not a mathematician - what does this mean, in layperson's terms?

> log_generator = (d for d in logdata)
> mylist = # first ten values

OK

> while True:
>     try:
>         mylist.sort()

OK - sort the first 10 values.

>         nextdata = mylist.pop(0)

So the first value...

>         mylist.append(log_generator.next())

Right, this will add another one value?

>     except StopIteration:
>         print 'done'

> Or now that I look, python has a priority queue (
> http://docs.python.org/library/heapq.html ) that you could use instead. Just
> push the next value into the queue and pop one out - you give it some
> initial qty - 10 or so, and then it will always give you the smallest value.

That sounds very cool - and I see that one of the activestate recipes
Kent suggested uses heapq too.  I'll have a play.

S.
--
Stephen Nelson-Smith
Technical Director
Atalanta Systems Ltd
www.atalanta-systems.com
_______________________________________________
Tutor maillist  -  Tutor@...
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: Logfile Manipulation

by Alan Gauld :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.

> > what the default python sorting algorithm is on a list, but AFAIK you'd be
> > looking at a constant O(log 10)
>
> I'm not a mathematician - what does this mean, in layperson's terms?

O(log10) is a way of expressing the efficiency of an algorithm.
Its execution time is proportional (in the Order of) log10.
That is the time to process 100 items will be about twice the time to 
process 10 (rather than 100 times), since log(100) is 2 and log(10) is 1 

These measures are indicative only, but the bottom line is that it will 
scale to high volumes better than a linear algorithm would.

HTH,

Alan G.

_______________________________________________
Tutor maillist  -  Tutor@...
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor