[PATCH] Remove the encode/decode calls in DummyResponse.write()/getvalue() and take a more optimistic approach

View: New views
4 Messages — Rating Filter:   Alert me  

[PATCH] Remove the encode/decode calls in DummyResponse.write()/getvalue() and take a more optimistic approach

by tyler-53 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Borrowing some concepts from the "slide-compat" branch that I maintain
for Slide, Inc. for gracefully handling less-than-ideal string-encoding
situations (as is the case for Slide).

Making DummyResponse.getvalue() optimistic in trying to u''.join() a
list of random string (unicode, str (various encodings)) objects
and then only on a UnicodeDecodeError, run through the "safeConvert"
function (blech) to handle encoded str() objects
---
 cheetah/DummyTransaction.py |   40 +++++++++++++++++++++++++---------------
 1 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/cheetah/DummyTransaction.py b/cheetah/DummyTransaction.py
index 6726a63..2b49d30 100644
--- a/cheetah/DummyTransaction.py
+++ b/cheetah/DummyTransaction.py
@@ -8,6 +8,7 @@ Warning: This may be deprecated in the future, please do not rely on any
 specific DummyTransaction or DummyResponse behavior
 '''
 
+import logging
 import types
 
 class DummyResponseFailure(Exception):
@@ -24,31 +25,40 @@ class DummyResponse(object):
 
     def flush(self):
         pass
-        
+
+    def safeConvert(self, chunk):
+        # Exceptionally gross, but the safest way
+        # I've found to ensure I get a legit unicode object
+        if not chunk:
+            return u''
+        if isinstance(chunk, unicode):
+            return chunk
+        try:
+            return chunk.decode('utf-8', 'strict')
+        except UnicodeDecodeError:
+            try:
+                return chunk.decode('latin-1', 'strict')
+            except UnicodeDecodeError:
+                return chunk.decode('ascii', 'ignore')
+        except AttributeError:
+            return unicode(chunk)
+        return chunk
+
     def write(self, value):
-        if isinstance(value, unicode):
-            value = value.encode('utf-8')
         self._outputChunks.append(value)
 
-
     def writeln(self, txt):
         write(txt)
         write('\n')
 
     def getvalue(self, outputChunks=None):
         chunks = outputChunks or self._outputChunks
-        try:
-            return ''.join(chunks).decode('utf-8')
+        try:
+            return u''.join(chunks)
         except UnicodeDecodeError, ex:
-            nonunicode = [c for c in chunks if not isinstance(c, unicode)]
-            raise DummyResponseFailure('''Looks like you're trying to mix encoded strings with Unicode strings
-            (most likely utf-8 encoded ones)
-
-            This can happen if you're using the `EncodeUnicode` filter, or if you're manually
-            encoding strings as utf-8 before passing them in on the searchList (possible offenders:
-            %s)
-            (%s)''' % (nonunicode, ex))
-
+            logging.debug('Trying to work around a UnicodeDecodeError in getvalue()')
+            logging.debug('...perhaps you could fix "%s" while you\'re debugging')
+            return ''.join((self.safeConvert(c) for c in chunks))
 
     def writelines(self, *lines):
         ## not used
--
1.6.0.2


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Cheetahtemplate-discuss mailing list
Cheetahtemplate-discuss@...
https://lists.sourceforge.net/lists/listinfo/cheetahtemplate-discuss

Re: [PATCH] Remove the encode/decode calls in DummyResponse.write()/getvalue() and take a more optimistic approach

by Aahz :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Oct 14, 2009, R. Tyler Ballance wrote:

>
> +        try:
> +            return chunk.decode('utf-8', 'strict')
> +        except UnicodeDecodeError:
> +            try:
> +                return chunk.decode('latin-1', 'strict')
> +            except UnicodeDecodeError:
> +                return chunk.decode('ascii', 'ignore')
> +        except AttributeError:
> +            return unicode(chunk)
> +        return chunk

Why is it safe to just return unicode() after AttributeError without
another try/except?  Maybe you want to do unicode(chunk, errors='ignore')?

> +            return ''.join((self.safeConvert(c) for c in chunks))

Triple-checking: we now support only 2.4 and higher?
--
Aahz (aahz@...)           <*>         http://www.pythoncraft.com/

"To me vi is Zen.  To use vi is to practice zen.  Every command is a
koan.  Profound to the user, unintelligible to the uninitiated.  You
discover truth everytime you use it."  --reddy@...

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Cheetahtemplate-discuss mailing list
Cheetahtemplate-discuss@...
https://lists.sourceforge.net/lists/listinfo/cheetahtemplate-discuss

Re: [PATCH] Remove the encode/decode calls in DummyResponse.write()/getvalue() and take a more optimistic approach

by tyler-53 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Thu, 15 Oct 2009, Aahz wrote:

> On Wed, Oct 14, 2009, R. Tyler Ballance wrote:
> >
> > +        try:
> > +            return chunk.decode('utf-8', 'strict')
> > +        except UnicodeDecodeError:
> > +            try:
> > +                return chunk.decode('latin-1', 'strict')
> > +            except UnicodeDecodeError:
> > +                return chunk.decode('ascii', 'ignore')
> > +        except AttributeError:
> > +            return unicode(chunk)
> > +        return chunk
>
> Why is it safe to just return unicode() after AttributeError without
> another try/except?  Maybe you want to do unicode(chunk, errors='ignore')?
Fair point, I think the errors='ignore' kwarg is a good suggestion but I
did consider the unicode() call safe since "chunk" object in this case
is almost certainly an non-string type.

>
> > +            return ''.join((self.safeConvert(c) for c in chunks))
>
> Triple-checking: we now support only 2.4 and higher?

Yeah, I considered this after committing, wasn't entirely certain if
generators existed in Python 2.3 (don't use it locally, only have it
setup for Hudson[1]).


Really appreciate the code review, I feel much better about writing code
when it's had secondary review prior to merging it upstream. I've
re-rolled my patch and have pushed it up to
gitL//github.com/rtyler/cheetah.git/next and
git://github.com/cheetahtemplate/cheetah.git/next


Cheers,
-R. Tyler Ballance


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Cheetahtemplate-discuss mailing list
Cheetahtemplate-discuss@...
https://lists.sourceforge.net/lists/listinfo/cheetahtemplate-discuss

attachment0 (204 bytes) Download Attachment

Re: [PATCH] Remove the encode/decode calls in DummyResponse.write()/getvalue() and take a more optimistic approach

by Aahz :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, Oct 15, 2009, tyler@... wrote:

> On Thu, 15 Oct 2009, Aahz wrote:
>> On Wed, Oct 14, 2009, R. Tyler Ballance wrote:
>>>
>>> +        try:
>>> +            return chunk.decode('utf-8', 'strict')
>>> +        except UnicodeDecodeError:
>>> +            try:
>>> +                return chunk.decode('latin-1', 'strict')
>>> +            except UnicodeDecodeError:
>>> +                return chunk.decode('ascii', 'ignore')
>>> +        except AttributeError:
>>> +            return unicode(chunk)
>>> +        return chunk
>>
>> Why is it safe to just return unicode() after AttributeError without
>> another try/except?  Maybe you want to do unicode(chunk, errors='ignore')?
>
> Fair point, I think the errors='ignore' kwarg is a good suggestion but I
> did consider the unicode() call safe since "chunk" object in this case
> is almost certainly an non-string type.

Consider this:

[')(*&^#%@)(*&)(*& BAD UNICODE DATA #@%*&^#@$*&^#@$']

However, I think it's reasonable to just do a "best guess" using ignore
for these types.
--
Aahz (aahz@...)           <*>         http://www.pythoncraft.com/

"To me vi is Zen.  To use vi is to practice zen.  Every command is a
koan.  Profound to the user, unintelligible to the uninitiated.  You
discover truth everytime you use it."  --reddy@...

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Cheetahtemplate-discuss mailing list
Cheetahtemplate-discuss@...
https://lists.sourceforge.net/lists/listinfo/cheetahtemplate-discuss