Faster list searching?

View: New views
8 Messages — Rating Filter:   Alert me  

Faster list searching?

by gpo :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.
I'm dealing with bigger lists than I have been, and noticed this is getting really slow.  Is there a faster way to do this?

for x in list1:
    if x not in list2:
        list3.append(x)

My search is taking up to 5 minutes to complete.

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
_______________________________________________
Tutor maillist  -  Tutor@...
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: Faster list searching?

by wescpy :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Nov 18, 2009 at 1:51 PM, GoodPotatoes <goodpotatoes@...> wrote:
> I'm dealing with bigger lists than I have been, and noticed this is getting
> really slow.  Is there a faster way to do this?
>
> for x in list1:
>     if x not in list2:
>         list3.append(x)
>
> My search is taking up to 5 minutes to complete.


greetings! hopefully this isn't a homework problem as we cannot help
with those. can you give us an example of the lists that you're using
(not the entire things, but just tell us what they contain)? also, use
the timeit module to show us some numbers to confirm what you're
seeing.

my suggestion would be to use sets if possible instead of lists since
those lookups are way faster (hash vs. sequence). if you're using
Python 3, i think you can even build the solution set using set
comprehensions.

-- wesley
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
"Core Python Programming", Prentice Hall, (c)2007,2001
"Python Fundamentals", Prentice Hall, (c)2009
    http://corepython.com

wesley.j.chun :: wescpy-at-gmail.com
python training and technical consulting
cyberweb.consulting : silicon valley, ca
http://cyberwebconsulting.com
_______________________________________________
Tutor maillist  -  Tutor@...
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: Faster list searching?

by Kent Johnson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Nov 18, 2009 at 4:51 PM, GoodPotatoes <goodpotatoes@...> wrote:
> I'm dealing with bigger lists than I have been, and noticed this is getting
> really slow.  Is there a faster way to do this?
>
> for x in list1:
>     if x not in list2:
>         list3.append(x)
>
> My search is taking up to 5 minutes to complete.

If you can, use a set instead of list2. "x not in list2" does a linear
search of list2 for x, which takes time proportional to the length of
list2. Searching a set takes constant time - it doesn't depend on the
size of the set. Also you can use a list comprehension to speed up the
outer loop a little:

set_of_list2 = set(list2)
list3 = [ x for x in list1 if x not in set_of_list2 ]

Note this will only work if the elements of list2 are hashable
(useable as dict keys).

Kent
_______________________________________________
Tutor maillist  -  Tutor@...
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: Faster list searching?

by Bill Campbell-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Nov 18, 2009, GoodPotatoes wrote:
>I'm dealing with bigger lists than I have been, and noticed this is getting really slow.  Is there a faster way to do this?
>
>for x in list1:
>    if x not in list2:
>        list3.append(x)
>
>My search is taking up to 5 minutes to complete.

When I have had to deal with large lists, I have found that using
an intermediate dictionary can save huge amounts of time.
Something like:

dict2 = {}.fromkeys(list2)
for x in list1:
        if x not in dist2:
                dict2[x] = True

list2 = dict2.keys()

Bill
--
INTERNET:   bill@...  Bill Campbell; Celestial Software LLC
URL: http://www.celestial.com/  PO Box 820; 6641 E. Mercer Way
Voice:          (206) 236-1676  Mercer Island, WA 98040-0820
Fax:            (206) 232-9186  Skype: jwccsllc (206) 855-5792

Intaxication: Euphoria at getting a refund from the IRS, which lasts until
you realize it was your money to start with.
_______________________________________________
Tutor maillist  -  Tutor@...
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: Faster list searching?

by Luke Paireepinart :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



On Wed, Nov 18, 2009 at 5:03 PM, Bill Campbell <bill@...> wrote:

When I have had to deal with large lists, I have found that using
an intermediate dictionary can save huge amounts of time.
Something like:

dict2 = {}.fromkeys(list2)
for x in list1:
       if x not in dist2:
               dict2[x] = True

list2 = dict2.keys()

This is really just a round-about way of using sets.
I don't really want to give a code-sample unless he's confirmed he's not doing this as homework, but the set version is much more simple (shorter code that makes more sense) and extremely quick as well.  If you're interested in it, Bill, reply to me off-list and I'll send it to you.

-Luke

_______________________________________________
Tutor maillist  -  Tutor@...
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: Faster list searching?

by Luke Paireepinart :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



On Wed, Nov 18, 2009 at 5:54 PM, Luke Paireepinart <rabidpoobear@...> wrote:
This is really just a round-about way of using sets.
I don't really want to give a code-sample unless he's confirmed he's not doing this as homework, but the set version is much more simple (shorter code that makes more sense) and extremely quick as well.  If you're interested in it, Bill, reply to me off-list and I'll send it to you.

Never mind about this, Kent already gave basically the same code sample I was going to.
-Luke


_______________________________________________
Tutor maillist  -  Tutor@...
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: Faster list searching?

by Tim Peters :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

[Luke Paireepinart]
>> This is really just a round-about way of using sets.
>> I don't really want to give a code-sample unless he's confirmed he's not
>> doing this as homework, but the set version is much more simple (shorter
>> code that makes more sense) and extremely quick as well.  If you're
>> interested in it, Bill, reply to me off-list and I'll send it to you.

[also Luke Paireepinart]
> Never mind about this, Kent already gave basically the same code sample I
> was going to.

So long as the cat's out of the list, may as well do it "the obvious"
;-) way too:

result = set(list1) - set(list2)

Of course the result is a set then.  Maybe that will work fine in
context, maybe not.  I leave it as an exercise to figure out how to
change it back into a list (hint:  try the obvious way first ;-) ).
_______________________________________________
Tutor maillist  -  Tutor@...
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: Faster list searching?

by Alan Gauld :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


"Tim Peters" <tim.peters@...> wrote

> result = set(list1) - set(list2)
>
> Of course the result is a set then.  

Which means that if there were duplicates in list1 you only get
one copy in the result. As Tim says, whether that is good, bad
or irrelevant depends on the problem context.

> Maybe that will work fine in context, maybe not.  
> I leave it as an exercise to figure out how to
> change it back into a list (hint:  try the obvious way first ;-) ).

But that won't replace any missing duplicates.
If they are significant you'll probably need to stick with Kent's
list comprehension approach.

HTH,

--
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/


_______________________________________________
Tutor maillist  -  Tutor@...
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor