|
View:
New views
8 Messages
—
Rating Filter:
Alert me
|
|
|
how to remove the same words in the paragraphI need to remove the word if it appears in the paragraph twice. could
some give me some clue or some useful function in the python. -- http://mail.python.org/mailman/listinfo/python-list |
|
|
Re: how to remove the same words in the paragraphOn Tue, Nov 3, 2009 at 11:13 PM, kylin <huili.song@...> wrote:
> I need to remove the word if it appears in the paragraph twice. could > some give me some clue or some useful function in the python. Well, it depends a bit on what you call 'the same word' (In the paragraph "Fly fly, fly!" does the word fly occur 0, 1, 2 or 3 times?), but the split() function seems a logical choice to use whatever the answer to that question. -- André Engels, andreengels@... -- http://mail.python.org/mailman/listinfo/python-list |
|
|
Re: how to remove the same words in the paragraphkylin wrote:
> I want to remove all the punctuation and no need words form a string > datasets for experiment. > I need to remove the word if it appears in the paragraph twice. could > some give me some clue or some useful function in the python. >>> para = u"""I need to remove the word if it appears in the paragraph twice. could ... some give me some clue or some useful function in the python. ... """ >>> print "\n".join(sorted(set(para.translate(dict.fromkeys(map(ord, ".:,-"))).split()))) I appears clue could function give if in it me need or paragraph python remove some the to twice useful word -- http://mail.python.org/mailman/listinfo/python-list |
|
|
Re: how to remove the same words in the paragraphkylin wrote:
> I need to remove the word if it appears in the paragraph twice. could > some give me some clue or some useful function in the python. Sounds like homework. To fail your class, use this one: >>> p = "one two three four five six seven three four eight" >>> s = set() >>> print ' '.join(w for w in p.split() if not (w in s or s.add(w))) one two three four five six seven eight which is absolutely horrible because it mutates the set within the list comprehension. The passable solution would use a for-loop to iterate over each word in the paragraph, emitting it if it hadn't already been seen. Maintain those words in set, so your words know how not to be seen. ("Mr. Nesbitt, would you please stand up?") This also assumes your paragraph consists only of words and whitespace. But since you posted your previous homework-sounding question on stripping out non-word/whitespace characters, you'll want to look into using a regexp like "[\w\s]" to clean up the cruft in the paragraph. Neither solution above preserves non white-space/word characters, for which I'd recommend using a re.sub() with a callback. Such a callback class might look something like >>> class Dedupe: ... def __init__(self): ... self.s = set() ... def __call__(self, m): ... w = m.group(0) ... if w in self.s: return '' ... self.s.add(w) ... return w ... >>> r.sub(Dedupe(), p) where I leave the definition of "r" to the student. Also beware of case-differences for which you might have to normalize. You'll also want to use more descriptive variable names than my one-letter tokens. -tkc -- http://mail.python.org/mailman/listinfo/python-list |
|
|
Re: how to remove the same words in the paragraphOn Wed, Nov 4, 2009 at 4:27 AM, Tim Chase <python.list@...> wrote:
Can we use inp_paragraph.count(iter_word) to make it simple ? This also assumes your paragraph consists only of words and whitespace. But since you posted your previous homework-sounding question on stripping out non-word/whitespace characters, you'll want to look into using a regexp like "[\w\s]" to clean up the cruft in the paragraph. Neither solution above preserves non white-space/word characters, for which I'd recommend using a re.sub() with a callback. Such a callback class might look something like -- Yours, S.Selvam -- http://mail.python.org/mailman/listinfo/python-list |
|
|
Re: how to remove the same words in the paragraph> Can we use inp_paragraph.count(iter_word) to make it simple ?
It would work, but the performance will drop off sharply as the length of the paragraph grows, and you'd still have to keep track of which words you already printed so you can correctly print the first one. So you might as well not bother with counting. -tkc -- http://mail.python.org/mailman/listinfo/python-list |
|
|
Re: how to remove the same words in the paragraphOn Wed, Nov 4, 2009 at 4:27 AM, Tim Chase <python.list@...> wrote:
I think simple regex may come handy, p=re.compile(r'(.+) .*\1') #note the space s=p.search("python and i love python") s.groups() (' python',) But that matches for only one double word.Someone else could light up here to extract all the double words.Then they can be removed from the original paragraph. -- Yours, S.Selvam Sent from Bangalore, KA, India -- http://mail.python.org/mailman/listinfo/python-list |
|
|
Re: how to remove the same words in the paragraph> I think simple regex may come handy,
> > p=re.compile(r'(.+) .*\1') #note the space > s=p.search("python and i love python") > s.groups() > (' python',) > > But that matches for only one double word.Someone else could light up here > to extract all the double words.Then they can be removed from the original > paragraph. This has multiple problems: >>> p = re.compile(r'(.+) .*\1') >>> s = p.search("python one two one two python") >>> s.groups() ('python',) >>> s = p.search("python one two one two python one") >>> s.groups() # guess what happened to the 2nd "one"... ('python one',) and even once you have the list of theoretical duplicates (by changing the regexp to r'\b(\w+)\b.*?\1' perhaps), you still have to worry about emitting the first instance but not subsequent instances. -tkc -- http://mail.python.org/mailman/listinfo/python-list |
| Free embeddable forum powered by Nabble | Forum Help |