|
View:
New views
5 Messages
—
Rating Filter:
Alert me
|
|
|
Unicode handling in repoze identificationThere seems to be a problem, but i don't know where to ask so i give it a shot here. I defined a users table where the username is Unicode (the sqlalchemy type). When i create a user with a non-ascii char in the username it gets stored and returned properly as a unicode python string. The repoze.who sqlalchemy plugin does basically the following according to the source code: def get_user(username): username_attr = getattr(self.user_class, self.translations ['user_name']) query = self.dbsession.query(self.user_class) query = query.filter(username_attr==username) return query.one() The thing is that this function is called with username being identity ['login'] which seems to be set by repoze.who's FormPlugin during the identification phase. Again the source code does something like: def identify(environ): form = parse_formvars(environ) login = form['login'] password = form['password'] return {'login':login, 'password':password} I'm guessing that this is the login used by the sqlalchemy's repoze.who plugin. The problem is the following, parse_formvars (from paste.request) returns a dict containing str string, not unicode, and the sqlalchemy query will compare unicode strings in the db, with byte strings from the form, resulting in the following warning each time an attempt to identify an user is made: /home/kikidonk/Whatever/tg2/lib/python2.5/site-packages/ SQLAlchemy-0.5.2-py2.5.egg/sqlalchemy/engine/default.py:241: SAWarning: Unicode type received non-unicode bind param value 'foo bar \xc3\xa9' param[key.encode(encoding)] = processors[key](compiled_params[key]) As you can see the user_name value is 'foo bar\xc3\xa9', while in the DB it is stored as unicode 'foo baré' and so that user is never matched. I'm not familiar enough with the whole stack to determine where this should be fixed? * In repoze.who's form plugin (decode form values as unicode) * In repoze.who's sqlalachemy plugin before the query (but then who knows the encoding of the string) * In paste.request.parse_formvars, there seems to be discussions about returning UnicodeMultiDict but some are reluctant to do that * In some kind of wrapper/monkey patch somewhere in TG2 In any case it breaks authentication for any username using non-ascii user_names, and the quickstart template in TG2 uses Unicode for user_name so it is bound to happen sometime. Any thoughts on how to fix this ? --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "TurboGears" group. To post to this group, send email to turbogears@... To unsubscribe from this group, send email to turbogears+unsubscribe@... For more options, visit this group at http://groups.google.com/group/turbogears?hl=en -~----------~----~----~----~------~----~------~--~--- |
|
|
Re: Unicode handling in repoze identificationRaphael Slinckx schrieb: > There seems to be a problem, but i don't know where to ask so i give > it a shot here. > ... > In any case it breaks authentication for any username using non-ascii > user_names, and the quickstart template in TG2 uses Unicode for > user_name so it is bound to happen sometime. Any thoughts on how to > fix this ? I can reproduce the problem and your analysis seems to be correct. I think this should be fixed in paste.request. The parse_formvars should return unicode instead of encoded strings. I saw that it simply ignores the charset (anything specified after a semicolon) in the content_type. Instead, it should analyze the content_type for an explicitly defined charset (assuming utf-8 if nothing is specified), and then it should decode the content using that charset. Even in Python 2, we should learn to follow the Python 3 paradigm of keeping everything that can potentially be non-ascii in unicode and convert things immediately at the i/o boundaries. In this case, this is IMHO the duty of the parse_formvars function. -- Christoph --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "TurboGears" group. To post to this group, send email to turbogears@... To unsubscribe from this group, send email to turbogears+unsubscribe@... For more options, visit this group at http://groups.google.com/group/turbogears?hl=en -~----------~----~----~----~------~----~------~--~--- |
|
|
Re: Unicode handling in repoze identificationOn Friday February 13, 2009 14:42:48 Christoph Zwerschke wrote: > Raphael Slinckx schrieb: > > There seems to be a problem, but i don't know where to ask so i give > > it a shot here. > > ... > > In any case it breaks authentication for any username using non-ascii > > user_names, and the quickstart template in TG2 uses Unicode for > > user_name so it is bound to happen sometime. Any thoughts on how to > > fix this ? > > I can reproduce the problem and your analysis seems to be correct. > > I think this should be fixed in paste.request. The parse_formvars should > return unicode instead of encoded strings. I saw that it simply ignores > the charset (anything specified after a semicolon) in the content_type. > Instead, it should analyze the content_type for an explicitly defined > charset (assuming utf-8 if nothing is specified), and then it should > decode the content using that charset. > > Even in Python 2, we should learn to follow the Python 3 paradigm of > keeping everything that can potentially be non-ascii in unicode and > convert things immediately at the i/o boundaries. In this case, this is > IMHO the duty of the parse_formvars function. +1, definitely. -- Gustavo Narea <http://gustavonarea.net/>. Get rid of unethical constraints! Get freedomware: http://www.getgnulinux.org/ --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "TurboGears" group. To post to this group, send email to turbogears@... To unsubscribe from this group, send email to turbogears+unsubscribe@... For more options, visit this group at http://groups.google.com/group/turbogears?hl=en -~----------~----~----~----~------~----~------~--~--- |
|
|
Re: Unicode handling in repoze identification> > I think this should be fixed in paste.request. The parse_formvars should Are there any paste devs out here ? Else i'm going to report this bug in paste, even though parse_formvars might be defined purposedly to not decode stuff, even if that sounds broken. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "TurboGears" group. To post to this group, send email to turbogears@... To unsubscribe from this group, send email to turbogears+unsubscribe@... For more options, visit this group at http://groups.google.com/group/turbogears?hl=en -~----------~----~----~----~------~----~------~--~--- |
|
|
Re: Unicode handling in repoze identificationOn Tue, Feb 17, 2009 at 6:20 AM, Raphael Slinckx <rslinckx@...> wrote: > >> > I think this should be fixed in paste.request. The parse_formvars should > Are there any paste devs out here ? > Else i'm going to report this bug in paste, even though parse_formvars > might be defined purposedly to not decode stuff, even if that sounds > broken. Funny, My suggestion will be for you to figure out the patch and submit the ticket, normally code changes are applied really fast but "problem tickets" aren't. Also it will be a good idea to search the Paste trac, before going for it. I ones found a bug that was fixed with a patch but not applied :( --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "TurboGears" group. To post to this group, send email to turbogears@... To unsubscribe from this group, send email to turbogears+unsubscribe@... For more options, visit this group at http://groups.google.com/group/turbogears?hl=en -~----------~----~----~----~------~----~------~--~--- |
| Free embeddable forum powered by Nabble | Forum Help |