|
View:
New views
11 Messages
—
Rating Filter:
Alert me
|
|
|
|
|
|
Re: i18n problems in Websh (multibyte charsets) (was Re: switching to apache)Hi, Ronnie,
About webout_eval_tag() patch, I'd got confution. It was bad idea that importing related code from tcl-rivet. I should make more modifies for websh. I've read the code more carfully. And I think I found the solution. Current webout_eval_tag() contains following code: .... prev = cur; cur++ continue; .... char * cur contains the contents. It may be unicode string. This means it may be multi-byte string not only single-byte. So cur++ may not point next char. I think above lines should be as following: .... prev = cur; cur = (char *)Tcl_UtfNext(cur); continue; .... An attatchement is new patch against CVS-HEAD. I think it work fine. But I found another probrems. Scripts under test/ are contain raw 8bit strings. And I think there are raw 8bit unicode strings .... It requires "encoding system utf-8". But many system has other encoding such as iso8859-1, euc-jp, and so on. Such systems can not read raw 8bit unicode strings. I think they should use \uXXXX notation. Thanks. Taguchi,T. --- --------------------------------------------------------------------- To unsubscribe, e-mail: websh-dev-unsubscribe@... For additional commands, e-mail: websh-dev-help@... |
|
|
Re: i18n problems in Websh (multibyte charsets) (was Re: switching to apache)> I've read the code more carfully. And I think I found the
> solution. Sorry, Not yet. Multibyte strings are broken in webout_eval_tag(). It seem fine top of webout_eval_tag(). variable cur contains correct string. But variable dstr does not contain correct string. It's broken. I think quote_append() can not deal multi-byte string. So I make a patch. But not good enough.... I have a question. Does websh get contents from apache server as unicode string, or binary string? --------------------------------------------------------------------- To unsubscribe, e-mail: websh-dev-unsubscribe@... For additional commands, e-mail: websh-dev-help@... |
|
|
Re: i18n problems in Websh (multibyte charsets)Hi
I looked at your patch. As you already know, it does not rellay work yet. But can you tell me, what this modification s for? diff -ur tcl-websh.orig/src/generic/formdata.c tcl-websh/src/generic/formdata.c --- tcl-websh.orig/src/generic/formdata.c Mon Aug 29 13:24:13 2005 +++ tcl-websh/src/generic/formdata.c Mon Aug 29 13:30:19 2005 @@ -41,6 +41,7 @@ int readToEnd = 0; int content_length = 0; Tcl_DString translation; + Tcl_DString encoding; channel = Web_GetChannelOrVarChannel(interp, channelName, &mode); if (channel == NULL) { @@ -63,7 +64,9 @@ } Tcl_DStringInit(&translation); + Tcl_DStringInit(&encoding); Tcl_GetChannelOption(interp, channel, "-translation", &translation); + Tcl_GetChannelOption(interp, channel, "-encoding", &encoding); Tcl_SetChannelOption(interp, channel, "-translation", "binary"); /* ------------------------------------------------------------------------ @@ -88,7 +91,9 @@ if (Tcl_GetIntFromObj(interp, len, &content_length) != TCL_OK) { Tcl_SetChannelOption(interp, channel, "-translation", Tcl_DStringValue(&translation)); + Tcl_SetChannelOption(interp, channel, "-encoding", Tcl_DStringValue(&encoding)); Tcl_DStringFree(&translation); + Tcl_DStringFree(&encoding); /* unregister if was a varchannel */ Web_UnregisterVarChannel(interp, channelName, channel); return TCL_ERROR; @@ -122,7 +127,9 @@ Tcl_DecrRefCount(formData); Tcl_SetChannelOption(interp, channel, "-translation", Tcl_DStringValue(&translation)); + Tcl_SetChannelOption(interp, channel, "-encoding", Tcl_DStringValue(&encoding)); Tcl_DStringFree(&translation); + Tcl_DStringFree(&encoding); /* unregister if was a varchannel */ Web_UnregisterVarChannel(interp, channelName, channel); @@ -131,7 +138,9 @@ } Tcl_SetChannelOption(interp, channel, "-translation", Tcl_DStringValue(&translation)); + Tcl_SetChannelOption(interp, channel, "-encoding", Tcl_DStringValue(&encoding)); Tcl_DStringFree(&translation); + Tcl_DStringFree(&encoding); /* unregister if was a varchannel */ Web_UnregisterVarChannel(interp, channelName, channel); As far as I can see, this doesn't do anything: it just saves the encoding, then doesn't do anything with it and before returning sets it to the value it's already set to. Did I miss something? > Does websh get contents from apache server as unicode string, > or binary string? I'm not sure if I understand your question correctly: The strings are always the same (binary?). The encoding is just how you interpret it. If you talk about multipart formdata: you get the encoding within the data, otherwise (www-form-urlencoded) is just a defined 8-bit encoding (I would not know how multibyte character sets are posted in that encoding.) If you talk about scripts that are sourced from mod_websh, you have to look at src/generic/webinterp.c: in readWebInterpCode() we basically do the following: Tcl_Obj *objPtr = Tcl_NewObj(); chan = Tcl_OpenFileChannel(interp, filename, "r", 0644); Tcl_ReadChars(chan, objPtr, -1, 0); Tcl_Close(interp, chan); -> objPtr is the code object that is later eval'ed using Tcl_EvalObjEx Hope that helps Best regards Ronnie ----------------------------------------------------------------------- Ronnie Brunner ronnie.brunner@... Netcetera AG, 8040 Zuerich, phone +41 44 247 79 79 fax +41 44 247 70 75 --------------------------------------------------------------------- To unsubscribe, e-mail: websh-dev-unsubscribe@... For additional commands, e-mail: websh-dev-help@... |
|
|
Re: i18n problems in Websh (multibyte charsets)Hi,
Tcl has encoding mechanism. Tcl assumes all input strings from input channel are written using its encoding. default encoding can be refferd by "encoding system" command. And Tcl convert this string from its encoding to internal UTF-8 string. For output, Tcl try to convert from internal UTF-8 string to its encoding. With the exception of such converting, if channel's encoding is "binary", then no conversion occur. Many "multibyte probrems" are occur at this point. Imagen some system which has multibyte encoding such as euc-jp. And websh test/ scripts are contain raw utf-8? multibyte strings. If websh which has multibyte encoding try to read such test scripts, it will try to convert the scripts from its system encoding to internal UTF-8 encoding. But input is already UTF-8 string. So it will be broken. So all raw 8bit string must be written "\uXXX" notation. And It must be correct string for system encoding. For example, Tcl can read any Chinese string which written using "\uXXX" notatin. But If its Tcl has "euc-jp" system encoding, Tcl can not output it. Encoding for output channel must be Chinese encoding for Chinese string. So I think test scripts must be evaluated under correct encoding. And the otherhand, Some one will think encoding binary is good solution. But It is not good idea. Tcl can input a string from binary encoding channel. and can output such string. But Tcl can not operate such string. For example, % fconfigure stdin -encoding binary % set rawStr [gets stdin] % set splitStr [split $rawStr {}]; # splitStr will be broken. I think websh try to deal multibyte string as single byte string. And additionaly, I think websh also has above encoding related probrems. > If you talk about scripts that are sourced from mod_websh, you have to > look at src/generic/webinterp.c: in readWebInterpCode() we basically src/generic/interpool.c ? > do the following: > > Tcl_Obj *objPtr = Tcl_NewObj(); > chan = Tcl_OpenFileChannel(interp, filename, "r", 0644); > Tcl_ReadChars(chan, objPtr, -1, 0); > Tcl_Close(interp, chan); > -> objPtr is the code object that is later eval'ed using Tcl_EvalObjEx > > Hope that helps Thanks! Ronnie. I want to find this one, But I could not... Notice. Encoding for this channel "chan" is default system encoding. Websh can read ws3 script which written using its system encoding. But I think channel for formdata has "binary" encoding. So websh can not deal multibyte form data. Ofcause, web::put [encoding convertfrom [encoding system] [web::formvar varName]] work fine. Thanks. Taguchi,T. --------------------------------------------------------------------- To unsubscribe, e-mail: websh-dev-unsubscribe@... For additional commands, e-mail: websh-dev-help@... |
|
|
Re: i18n problems in Websh (multibyte charsets)Hi,
I found another probrem. web::htmlify can not deal multibyte string. for example: > websh3.5.1a % web::htmlify abcd abcd ; # it seem work fine. % web::htmlify "\u3042\u3044\u3046\u3048\u304a" ; # return empty string!! "\u3042\u3044\u3046\u3048\u304a" is a correct Japanese string. So I think web::htmlify should return string which substituted from it. I'm reading webHtmlify() in htmlify.c now ....... Thanks, Taguchi,T. --------------------------------------------------------------------- To unsubscribe, e-mail: websh-dev-unsubscribe@... For additional commands, e-mail: websh-dev-help@... |
|
|
Re: i18n problems in Websh (multibyte charsets)From a very quick glance at this thread, I think what you might want to
consider doing is using next = (char *)Tcl_UtfNext(cur); in the parser. weboutint.c uses Dstrings. In Rivet, the rivetParser.c file uses the Utf stuff in order to parse up the <? ?> tags. I think it's done correctly but you'd probably have to try it with some non-European encodings... One thing that would be handy would be a small test case in Tcl that demonstrates the problem (if I haven't missed it... apologies if I didn't see it). -- David N. Welton - http://www.dedasys.com/davidw/ Apache, Linux, Tcl Consulting - http://www.dedasys.com/ --------------------------------------------------------------------- To unsubscribe, e-mail: websh-dev-unsubscribe@... For additional commands, e-mail: websh-dev-help@... |
|
|
Re: i18n problems in Websh (multibyte charsets)Hi,
Rivet parser can not deal "{}" as script start/end tag. So first patch got many errors. I think so. I've made 3rd patch for web_eval_tag(). It is full scratch. All putx test scripts in test/webout.test are OK. I'm try to make a patch for htmlify. Thanks. Taguchi,T. --------------------------------------------------------------------- To unsubscribe, e-mail: websh-dev-unsubscribe@... For additional commands, e-mail: websh-dev-help@... |
|
|
Re: i18n problems in Websh (multibyte charsets)Hi,
I've finished cleanup my patch. I believe web::putx and web::htmlify probrem are solved. Now, They can deal not only single byte string, but also multi byte string. Sorry, I still have confuse about parseUrlEncodedFormData(). Is this 'Tcl_Channel channel' used as output channel? 'output' means web::putx or web::put write to this channel. If yes, its encoding option should be backuped. Because, Tcl_SetChannelOption(interp, channel, "-translation", "binary"); also sets its encoding option as its side-effects. If no, please forget this parts. All data from apache is ascii encoding. But output from mod_websh to apache might be other encoding includes mutibyte one. I'd forgot this, Sorry. Additionaly, This patch is still darty. Actuary, I'm not good at C language. I hate pointer. So I love Tcl ;-) Thanks, Taguchi,T. --- diff -ur tcl-websh.orig/src/generic/htmlify.c tcl-websh/src/generic/htmlify.c --- tcl-websh.orig/src/generic/htmlify.c Mon Aug 29 13:24:13 2005 +++ tcl-websh/src/generic/htmlify.c Fri Sep 2 22:15:27 2005 @@ -71,13 +71,17 @@ if (unic == 0) break; + /* + This code delete multibyte string!! if (unic > WEBENC_LATIN_TABLE_LENGTH) continue; + */ /* -------------------------------------------------------------------- * translation needed ? * ----------------------------------------------------------------- */ - if (convData->need[unic] == TCL_OK) { + if (unic <= WEBENC_LATIN_TABLE_LENGTH && + convData->need[unic] == TCL_OK) { /* yes */ diff -ur tcl-websh.orig/src/generic/weboutint.c tcl-websh/src/generic/weboutint.c --- tcl-websh.orig/src/generic/weboutint.c Wed Aug 31 14:53:36 2005 +++ tcl-websh/src/generic/weboutint.c Fri Sep 2 22:02:48 2005 @@ -368,174 +368,121 @@ return TCL_OK; } -/* -------------------------------------------------------------------------- - * quote_append (quote Tcl syntax characters and append to Tcl_DString) - * ----------------------------------------------------------------------- */ - -int quote_append(Tcl_DString *str, char *in, int len) -{ - int i = 0; - while (i < len) { - switch (*in) - { - case '{': - Tcl_DStringAppend(str, "\\{", -1); - break; - case '}': - Tcl_DStringAppend(str, "\\}", -1); - break; - case '$': - Tcl_DStringAppend(str, "\\$", -1); - break; - case '[': - Tcl_DStringAppend(str, "\\[", -1); - break; - case ']': - Tcl_DStringAppend(str, "\\]", -1); - break; - case '"': - Tcl_DStringAppend(str, "\\\"", -1); - break; -/* case '\\': - Tcl_DStringAppend(str, "\\\\", -1); - break; */ - default: - Tcl_DStringAppend(str, in, 1); - break; - } - in ++; - i ++; - } - return 0; -} - - /* ---------------------------------------------------------------------------- * webout_eval_tag (code in <? ?>) * ------------------------------------------------------------------------- */ int webout_eval_tag(Tcl_Interp * interp, ResponseObj * responseObj, Tcl_Obj * in, const char *strstart, const char *strend) { - Tcl_DString dstr; - Tcl_Obj *tclo = NULL; - - int inLen; - char *cur = NULL; - char *prev = NULL; - int cntOpen = 0; - int res = 0; - int startmatch = 0; - int endmatch = 0; - - int begin = 1; - char *start; - -/* const char *strstart = START_TAG; - const char *strend = END_TAG; */ -/* int endseqlen = strlen(END_TAG); - int startseqlen = strlen(START_TAG); - */ - int endseqlen = strlen(strstart); - int startseqlen = strlen(strend); - - if ((responseObj == NULL) || (in == NULL)) - return TCL_ERROR; - - Tcl_DStringInit(&dstr); - - cur = Tcl_GetStringFromObj(in, &inLen); - prev = cur; - start = cur; + Tcl_Obj *outbuf; + Tcl_Obj *tclo; + char *next; + char *cur; + + int endseqlen = strlen(strend); + int startseqlen = strlen(strstart); + int begin = 1; + int firstScan = 1; + int inside = 0, p = 0; + int inLen = 0; + int res = 0; - if (inLen == 0) - return TCL_OK; + next = Tcl_GetStringFromObj(in, &inLen); + outbuf = Tcl_NewStringObj("", -1); - printf("DEBUG: cur = %s\n", cur); + if (inLen == 0) + return 0; - while (*cur != 0) { - if (*cur == strstart[startmatch]) - { - if (*prev == '\\') { - Tcl_DStringAppend(&dstr, cur, 1); - } else if ((++startmatch) == startseqlen) { - /* We have matched the starting sequence. */ - if (cntOpen < 1) { - if (!((cur - (startseqlen - 1)) - start)) { - begin = 0; - } else { - Tcl_DStringAppend(&dstr, "\"\n", 2); - } - } else { - Tcl_DStringAppend(&dstr, strstart, -1); - } - cntOpen ++; - startmatch = 0; - } - prev = cur; - cur ++; - continue; - } else if (*cur == strend[endmatch] && (cntOpen > 0 || *prev == '\\')) { - if (*prev == '\\') { - Tcl_DStringAppend(&dstr, cur, 1); - } else if ((++endmatch) == endseqlen) - { - /* We have matched the ending sequence. */ - if (cntOpen == 1) { - /* build up the command with the name of the channel. */ - Tcl_DStringAppend(&dstr, "\n web::put \"", -1); - } else { - Tcl_DStringAppend(&dstr, strend, -1); - } - cntOpen --; - endmatch = 0; - } - prev = cur; - cur ++; - continue; - } else if (startmatch) { - if (cntOpen < 1) { - quote_append(&dstr, (char *)strstart, startmatch); - } else { - Tcl_DStringAppend(&dstr, (char *)strstart, startmatch); - } - startmatch = 0; - } else if (endmatch) { - if (cntOpen < 1) { - quote_append(&dstr, (char *)strend, endmatch); - } else { - Tcl_DStringAppend(&dstr, (char *)strend, endmatch); - } - endmatch = 0; - } - /* Put the current character in the output. If we are in Tcl - code, then don't escape Tcl characters. */ - if (cntOpen < 1) { - quote_append(&dstr, cur, 1); + while (*next != 0) { + cur = next; + next = (char *)Tcl_UtfNext(cur); + + if (strncmp("\\", cur, 1) == 0) { + if (firstScan == 1) { firstScan = 0; } + if (strncmp(strstart, next, startseqlen) == 0) { + Tcl_AppendToObj(outbuf, "\\", 1); + Tcl_AppendToObj(outbuf, strstart, startseqlen); + next += startseqlen; + } else if (strncmp(strend, next, endseqlen) == 0) { + Tcl_AppendToObj(outbuf, "\\", 1); + Tcl_AppendToObj(outbuf, strend, endseqlen); + next += endseqlen; + } else if (inside < 1) { + Tcl_AppendToObj(outbuf, "\\\\", 2); + } else { + Tcl_AppendToObj(outbuf, "\\", 1); + } + } else if (strncmp(strstart, cur, startseqlen) == 0) { + if ((++inside) == 1) { + if (firstScan == 1) { + begin = 0; + firstScan = 0; + Tcl_AppendToObj(outbuf, "\n", 1); } else { - Tcl_DStringAppend(&dstr, cur, 1); + Tcl_AppendToObj(outbuf, "\"\n", 2); } - prev = cur; - cur ++; - } - - /* build up the web::put with the name of the channel. */ - if (begin) { - tclo = Tcl_NewStringObj("web::put \"", -1); + if (startseqlen > 1) { + next += startseqlen - 1; + } + } else { + Tcl_AppendToObj(outbuf, cur, startseqlen); + if (startseqlen > 1) { + next += startseqlen - 1; + } + } + } else if (strncmp(strend, cur, endseqlen) == 0) { + if (firstScan == 1) { firstScan = 0; } + if ((--inside) == 0) { + Tcl_AppendToObj(outbuf, "\nweb::put \"", -1); + if (endseqlen > 1) { + next += endseqlen - 1; + } + } else { + Tcl_AppendToObj(outbuf, cur, endseqlen); + if (endseqlen > 1) { + next += endseqlen - 1; + } + } + if (inside < 0) { inside = 0; } + } else if (inside < 1) { + if (firstScan == 1) { firstScan = 0; } + switch (*cur) { + case '{': + Tcl_AppendToObj(outbuf, "\\{", -1); + break; + case '}': + Tcl_AppendToObj(outbuf, "\\}", -1); + break; + case '$': + Tcl_AppendToObj(outbuf, "\\$", -1); + break; + case '[': + Tcl_AppendToObj(outbuf, "\\[", -1); + break; + case ']': + Tcl_AppendToObj(outbuf, "\\]", -1); + break; + case '"': + Tcl_AppendToObj(outbuf, "\\\"", -1); + break; + default: + Tcl_AppendToObj(outbuf, cur, next - cur); + break; + } } else { - tclo = Tcl_NewStringObj("", -1); - } - - Tcl_AppendToObj(tclo, Tcl_DStringValue(&dstr), - Tcl_DStringLength(&dstr)); - - if (cntOpen < 1) { - Tcl_AppendToObj(tclo, "\"\n", 2); + if (firstScan == 1) { firstScan = 0; } + Tcl_AppendToObj(outbuf, cur, next - cur); } - - Tcl_DStringFree(&dstr); - printf("DEBUG: tclo = %s\n", Tcl_GetString(tclo)); - res = Tcl_EvalObjEx(interp, tclo, TCL_EVAL_DIRECT); - return res; + } + if (begin) { + tclo = Tcl_NewStringObj("web::put \"", -1); + Tcl_AppendObjToObj(tclo, outbuf); + } else { + tclo = outbuf; + } + Tcl_AppendToObj(tclo, "\"", -1); + res = Tcl_EvalObjEx(interp, tclo, TCL_EVAL_DIRECT); + return res; } /* ---------------------------------------------------------------------------- diff -ur tcl-websh.orig/src/tests/mintest.test tcl-websh/src/tests/mintest.test --- tcl-websh.orig/src/tests/mintest.test Mon Aug 29 13:24:13 2005 +++ tcl-websh/src/tests/mintest.test Mon Aug 29 13:49:40 2005 @@ -36,7 +36,7 @@ set res "" catch { ## fixme: use variable for tclsh8.3 - set res [exec tclsh8.3 $fn] + set res [exec tclsh8.4 $fn] } file delete -force $fn set res diff -ur tcl-websh.orig/src/unix/Makefile.in tcl-websh/src/unix/Makefile.in --- tcl-websh.orig/src/unix/Makefile.in Mon Aug 29 13:24:13 2005 +++ tcl-websh/src/unix/Makefile.in Mon Aug 29 13:52:43 2005 @@ -175,7 +175,7 @@ INCLUDES = @TCL_INCLUDES@ $(HTTPD_INCLUDES) -EXTRA_CFLAGS = $(TCL_DEFS) $(PROTO_FLAGS) $(SECURITY_FLAGS) $(MEM_DEBUG_FLAGS) $(KEYSYM_FLAGS) $(NO_DEPRECATED_FLAGS) +EXTRA_CFLAGS = $(TCL_DEFS) $(PROTO_FLAGS) $(SECURITY_FLAGS) $(MEM_DEBUG_FLAGS) $(KEYSYM_FLAGS) $(NO_DEPRECATED_FLAGS) $(TCL_EXTRA_CFLAGS) DEFS = @DEFS@ $(EXTRA_CFLAGS) @@ -290,7 +290,7 @@ websh$(VERSION): tclAppInit.$(OBJEXT) $(web_OBJECTS) $(CC) @LDFLAGS@ tclAppInit.$(OBJEXT) $(web_OBJECTS) \ - $(TCL_LIB_SPEC) $(TCL_LIBS) -o websh$(VERSION) + $(TCL_LIB_SPEC) $(TCL_LIBS) $(TCL_LD_FLAGS) -o websh$(VERSION) mod_websh$(SHARED_LIB_SUFFIX): $(web_ap_OBJECTS) @@ -385,15 +385,35 @@ # ============================================================================= install-doc: doc - $(mkinstalldirs) $(DESTDIR)/doc - @for i in quickref.html quickref.txt ; \ + $(mkinstalldirs) $(DESTDIR)/doc/html + @for i in Apache_module_specific_commands.html \ + command_dispatching_and_session_management.html \ + configuration.html \ + context_handling.html \ + data_encryption.html \ + file_handling_and_file_IO.html \ + index.html \ + inter-process_and_-system_communication.html \ + logging.html \ + misc_commands.html \ + request_data_handling.html \ + response_data_handling.html \ + uri-html-_en-decoding.html ; \ + do \ + echo "Installing $$i"; \ + rm -f $(DESTDIR)/doc/html/$$i; \ + $(INSTALL_DATA) ../../doc/html/$$i $(DESTDIR)/doc/html/$$i ; \ + chmod 444 $(DESTDIR)/doc/html/$$i; \ + done + @for i in INSTALL README ; \ + do \ echo "Installing $$i"; \ rm -f $(DESTDIR)/doc/$$i; \ - $(INSTALL_DATA) ../../doc/$$i $(DESTDIR)/doc/$$i ; \ + $(INSTALL_DATA) ../../$$i $(DESTDIR)/doc/$$i ; \ chmod 444 $(DESTDIR)/doc/$$i; \ done - @for i in README license.terms ChangeLog changes ; \ + @for i in ChangeLog license.terms ; \ do \ echo "Installing $$i"; \ rm -f $(DESTDIR)/doc/$$i; \ diff -ur tcl-websh.orig/src/generic/formdata.c tcl-websh/src/generic/formdata.c --- tcl-websh.orig/src/generic/formdata.c Mon Aug 29 13:24:13 2005 +++ tcl-websh/src/generic/formdata.c Fri Sep 2 22:28:47 2005 @@ -41,6 +41,7 @@ int readToEnd = 0; int content_length = 0; Tcl_DString translation; + Tcl_DString encoding; channel = Web_GetChannelOrVarChannel(interp, channelName, &mode); if (channel == NULL) { @@ -63,7 +64,9 @@ } Tcl_DStringInit(&translation); + Tcl_DStringInit(&encoding); Tcl_GetChannelOption(interp, channel, "-translation", &translation); + Tcl_GetChannelOption(interp, channel, "-encoding", &encoding); Tcl_SetChannelOption(interp, channel, "-translation", "binary"); /* ------------------------------------------------------------------------ @@ -88,7 +91,9 @@ if (Tcl_GetIntFromObj(interp, len, &content_length) != TCL_OK) { Tcl_SetChannelOption(interp, channel, "-translation", Tcl_DStringValue(&translation)); + Tcl_SetChannelOption(interp, channel, "-encoding", Tcl_DStringValue(&encoding)); Tcl_DStringFree(&translation); + Tcl_DStringFree(&encoding); /* unregister if was a varchannel */ Web_UnregisterVarChannel(interp, channelName, channel); return TCL_ERROR; @@ -122,7 +127,9 @@ Tcl_DecrRefCount(formData); Tcl_SetChannelOption(interp, channel, "-translation", Tcl_DStringValue(&translation)); + Tcl_SetChannelOption(interp, channel, "-encoding", Tcl_DStringValue(&encoding)); Tcl_DStringFree(&translation); + Tcl_DStringFree(&encoding); /* unregister if was a varchannel */ Web_UnregisterVarChannel(interp, channelName, channel); @@ -131,10 +138,15 @@ } Tcl_SetChannelOption(interp, channel, "-translation", Tcl_DStringValue(&translation)); + Tcl_SetChannelOption(interp, channel, "-encoding", Tcl_DStringValue(&encoding)); Tcl_DStringFree(&translation); + Tcl_DStringFree(&encoding); /* unregister if was a varchannel */ Web_UnregisterVarChannel(interp, channelName, channel); + LOG_MSG(interp, WRITE_LOG, __FILE__, __LINE__, + "parseUrlEncodedFormData()", WEBLOG_WARNING, + "formData \"", Tcl_GetString(formData), "\"", NULL); cmdList[0] = Tcl_NewStringObj("web::uri2list", -1); cmdList[1] = Tcl_DuplicateObj(formData); Tcl_IncrRefCount(cmdList[0]); @@ -199,6 +211,7 @@ char *boundary = mimeGetParamFromContDisp(content_type, "boundary"); int res = 0; Tcl_DString translation; + Tcl_DString encoding; /* printf("DBG parseMultipartFormData - starting\n"); fflush(stdout); */ @@ -230,13 +243,17 @@ } Tcl_DStringInit(&translation); + Tcl_DStringInit(&encoding); Tcl_GetChannelOption(interp, channel, "-translation", &translation); + Tcl_GetChannelOption(interp, channel, "-encoding", &encoding); Tcl_SetChannelOption(interp, channel, "-translation", "binary"); res = mimeSplitMultipart(interp, channel, boundary, requestData); Tcl_SetChannelOption(interp, channel, "-translation", Tcl_DStringValue(&translation)); + Tcl_SetChannelOption(interp, channel, "-encoding", Tcl_DStringValue(&encoding)); Tcl_DStringFree(&translation); + Tcl_DStringFree(&encoding); /* unregister if was a varchannel */ Web_UnregisterVarChannel(interp, channelName, channel); @@ -560,7 +577,7 @@ * open file * ----------------------------------------------------------------------- */ if ((out = Tcl_OpenFileChannel(NULL, Tcl_GetString(tmpFileName), - "w", 0644)) == NULL) + "w", 0600)) == NULL) return 0; /* -------------------------------------------------------------------------- --------------------------------------------------------------------- To unsubscribe, e-mail: websh-dev-unsubscribe@... For additional commands, e-mail: websh-dev-help@... |
|
|
Re: i18n problems in Websh (multibyte charsets)Hi Taguchi
> I've finished cleanup my patch. > I believe web::putx and web::htmlify probrem are solved. > Now, They can deal not only single byte string, but also > multi byte string. I applied your patch and tests run fine. Would it be possible for you to add some tests that confirm the new compliancy with other encodings? I would like to add some, so that we won't break things again, when we add new or fix stuff. > Sorry, I still have confuse about parseUrlEncodedFormData(). > Is this 'Tcl_Channel channel' used as output channel? > 'output' means web::putx or web::put write to this channel. Well, the problem is the following: in parseUrlEncodedFormData, we get URI encoded form fields. They are ASCII (only 8-Bit), but this is because they are encoded that way. The actual content might be a different charset altogether. Right now, we set channel to binary and read the ASCII stuff, then we set the channel back to what it was and we call web::uri2list, which decodes the actual form fields. At this time, they can have different encodings and unfortunately, I'm not really sure whether it works under all combinations. > If yes, its encoding option should be backuped. Because, > Tcl_SetChannelOption(interp, channel, "-translation", "binary"); > also sets its encoding option as its side-effects. OK, I finally found out what you mean: setting translation to binary does really drop the encoding information (which I didn't know and is, as far as I know not documented anywhere...) > All data from apache is ascii encoding. But output from mod_websh > to apache might be other encoding includes mutibyte one. > I'd forgot this, Sorry. Encoding of data from Apache is actually varying but not ASCII (look at the mutlipart form: the encoding might be part of the form data, where also binary files can be uploaded) -> So far, we always treated all data as binary and so it is in the responsibility of the application to convert if necessary. I'm not very sure if this works with all encodings, but obviously you now manage to handle your mutli-byte character set properly, eventhough Websh does not really treat mutlipart form data in the correct encoding, but handles it binary. If you have some example of what a browser submits and what Websh has to do with it and we can create some tests, I would very much like to add these tests to our test suite. (Something similar to the tests we have in src/tests/dispatch.test or src/tests/formdata.test I will look at the code more closely soon and if everything looks fine and we have some more tests for multibyte character sets, I'd like to commit your proposed changes. Thank you very much so far for your efforts. I appreciate it. Regards Ronnie ----------------------------------------------------------------------- Ronnie Brunner ronnie.brunner@... Netcetera AG, 8040 Zuerich, phone +41 44 247 79 79 fax +41 44 247 70 75 --------------------------------------------------------------------- To unsubscribe, e-mail: websh-dev-unsubscribe@... For additional commands, e-mail: websh-dev-help@... |
|
|
Re: i18n problems in Websh (multibyte charsets)Hi Taguchi
> I believe web::putx and web::htmlify probrem are solved. > Now, They can deal not only single byte string, but also > multi byte string. I have a question regarding htmlify: would it not make sense to encode all multibyte characters as <numeric>; as well? So the result is always ASCII compatible? On the other hand web::dehtmlify already handles this correctly. I committed a new config option that allows to set the permissions of all created files: web::config filepermissions 0600 Default is still 0644, but now you can set it explicitly, so you don't need the hack in formdata.c anymore. Please update from CVS and check if everything works for you as intended: except for the Makfile.in it should include all your suggestions, some tests and fixes to the quickref.xml. Regards Ronnie ----------------------------------------------------------------------- Ronnie Brunner ronnie.brunner@... Netcetera AG, 8040 Zuerich, phone +41 44 247 79 79 fax +41 44 247 70 75 --------------------------------------------------------------------- To unsubscribe, e-mail: websh-dev-unsubscribe@... For additional commands, e-mail: websh-dev-help@... |
| Free embeddable forum powered by Nabble | Forum Help |