Unicode handling in repoze identification

View: New views
5 Messages — Rating Filter:   Alert me  

Unicode handling in repoze identification

by Raphael Slinckx-3 :: Rate this Message:

| View Threaded | Show Only this Message


There seems to be a problem, but i don't know where to ask so i give
it a shot here.

I defined a users table where the username is Unicode (the sqlalchemy
type).
When i create a user with a non-ascii char in the username it gets
stored and returned properly as a unicode python string.

The repoze.who sqlalchemy plugin does basically the following
according to the source code:

def get_user(username):
   username_attr = getattr(self.user_class, self.translations
['user_name'])
   query = self.dbsession.query(self.user_class)
   query = query.filter(username_attr==username)
   return query.one()

The thing is that this function is called with username being identity
['login'] which seems to be set by repoze.who's FormPlugin during the
identification phase. Again the source code does something like:

def identify(environ):
  form = parse_formvars(environ)
  login = form['login']
  password = form['password']
  return {'login':login, 'password':password}

I'm guessing that this is the login used by the sqlalchemy's
repoze.who plugin.

The problem is the following, parse_formvars (from paste.request)
returns a dict containing str string, not unicode, and the sqlalchemy
query will compare unicode strings in the db, with byte strings from
the form, resulting in the following warning each time an attempt to
identify an user is made:

/home/kikidonk/Whatever/tg2/lib/python2.5/site-packages/
SQLAlchemy-0.5.2-py2.5.egg/sqlalchemy/engine/default.py:241:
SAWarning: Unicode type received non-unicode bind param value 'foo bar
\xc3\xa9'
  param[key.encode(encoding)] = processors[key](compiled_params[key])

As you can see the user_name value is 'foo bar\xc3\xa9', while in the
DB it is stored as unicode 'foo baré' and so that user is never
matched.

I'm not familiar enough with the whole stack to determine where this
should be fixed?
* In repoze.who's form plugin (decode form values as unicode)
* In repoze.who's sqlalachemy plugin before the query (but then who
knows the encoding of the string)
* In paste.request.parse_formvars, there seems to be discussions about
returning UnicodeMultiDict but some are reluctant to do that
* In some kind of wrapper/monkey patch somewhere in TG2

In any case it breaks authentication for any username using non-ascii
user_names, and the quickstart template in TG2 uses Unicode for
user_name so it is bound to happen sometime. Any thoughts on how to
fix this ?
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "TurboGears" group.
To post to this group, send email to turbogears@...
To unsubscribe from this group, send email to turbogears+unsubscribe@...
For more options, visit this group at http://groups.google.com/group/turbogears?hl=en
-~----------~----~----~----~------~----~------~--~---


Re: Unicode handling in repoze identification

by Christoph Zwerschke :: Rate this Message:

| View Threaded | Show Only this Message


Raphael Slinckx schrieb:
> There seems to be a problem, but i don't know where to ask so i give
> it a shot here.
> ...
> In any case it breaks authentication for any username using non-ascii
> user_names, and the quickstart template in TG2 uses Unicode for
> user_name so it is bound to happen sometime. Any thoughts on how to
> fix this ?

I can reproduce the problem and your analysis seems to be correct.

I think this should be fixed in paste.request. The parse_formvars should
return unicode instead of encoded strings. I saw that it simply ignores
the charset (anything specified after a semicolon) in the content_type.
Instead, it should analyze the content_type for an explicitly defined
charset (assuming utf-8 if nothing is specified), and then it should
decode the content using that charset.

Even in Python 2, we should learn to follow the Python 3 paradigm of
keeping everything that can potentially be non-ascii in unicode and
convert things immediately at the i/o boundaries. In this case, this is
IMHO the duty of the parse_formvars function.

-- Christoph

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "TurboGears" group.
To post to this group, send email to turbogears@...
To unsubscribe from this group, send email to turbogears+unsubscribe@...
For more options, visit this group at http://groups.google.com/group/turbogears?hl=en
-~----------~----~----~----~------~----~------~--~---


Re: Unicode handling in repoze identification

by Gustavo Narea :: Rate this Message:

| View Threaded | Show Only this Message


On Friday February 13, 2009 14:42:48 Christoph Zwerschke wrote:

> Raphael Slinckx schrieb:
> > There seems to be a problem, but i don't know where to ask so i give
> > it a shot here.
> > ...
> > In any case it breaks authentication for any username using non-ascii
> > user_names, and the quickstart template in TG2 uses Unicode for
> > user_name so it is bound to happen sometime. Any thoughts on how to
> > fix this ?
>
> I can reproduce the problem and your analysis seems to be correct.
>
> I think this should be fixed in paste.request. The parse_formvars should
> return unicode instead of encoded strings. I saw that it simply ignores
> the charset (anything specified after a semicolon) in the content_type.
> Instead, it should analyze the content_type for an explicitly defined
> charset (assuming utf-8 if nothing is specified), and then it should
> decode the content using that charset.
>
> Even in Python 2, we should learn to follow the Python 3 paradigm of
> keeping everything that can potentially be non-ascii in unicode and
> convert things immediately at the i/o boundaries. In this case, this is
> IMHO the duty of the parse_formvars function.

+1, definitely.
--
Gustavo Narea <http://gustavonarea.net/>.

Get rid of unethical constraints! Get freedomware:
http://www.getgnulinux.org/

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "TurboGears" group.
To post to this group, send email to turbogears@...
To unsubscribe from this group, send email to turbogears+unsubscribe@...
For more options, visit this group at http://groups.google.com/group/turbogears?hl=en
-~----------~----~----~----~------~----~------~--~---


Re: Unicode handling in repoze identification

by Raphael Slinckx-3 :: Rate this Message:

| View Threaded | Show Only this Message


> > I think this should be fixed in paste.request. The parse_formvars should
Are there any paste devs out here ?
Else i'm going to report this bug in paste, even though parse_formvars
might be defined purposedly to not decode stuff, even if that sounds
broken.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "TurboGears" group.
To post to this group, send email to turbogears@...
To unsubscribe from this group, send email to turbogears+unsubscribe@...
For more options, visit this group at http://groups.google.com/group/turbogears?hl=en
-~----------~----~----~----~------~----~------~--~---


Re: Unicode handling in repoze identification

by Jorge Vargas :: Rate this Message:

| View Threaded | Show Only this Message


On Tue, Feb 17, 2009 at 6:20 AM, Raphael Slinckx <rslinckx@...> wrote:
>
>> > I think this should be fixed in paste.request. The parse_formvars should
> Are there any paste devs out here ?
> Else i'm going to report this bug in paste, even though parse_formvars
> might be defined purposedly to not decode stuff, even if that sounds
> broken.

Funny, My suggestion will be for you to figure out the patch and
submit the ticket, normally code changes are applied really fast but
"problem tickets" aren't. Also it will be a good idea to search the
Paste trac, before going for it. I ones found a bug that was fixed
with a patch but not applied :(

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "TurboGears" group.
To post to this group, send email to turbogears@...
To unsubscribe from this group, send email to turbogears+unsubscribe@...
For more options, visit this group at http://groups.google.com/group/turbogears?hl=en
-~----------~----~----~----~------~----~------~--~---