[ htmlunit-Bugs-2818493 ] code problem by a Anchor (ä,ö,ü)

View: New views
1 Messages — Rating Filter:   Alert me  

[ htmlunit-Bugs-2818493 ] code problem by a Anchor (ä,ö,ü)

by SourceForge.net :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Bugs item #2818493, was opened at 2009-07-08 13:46
Message generated for change (Comment added) made by sethus
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=448266&aid=2818493&group_id=47038

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: Latest code in SVN
Status: Open
Resolution: Invalid
Priority: 5
Private: No
Submitted By: Konstantin Neubauer (sethus)
Assigned to: Ahmed Ashour (asashour)
Summary: code problem by a Anchor (ä,ö,ü)

Initial Comment:
I found a page with some links using this "%F6" instead of "ö"...
I tried to click on them by using html unit method click().

The method is converting the link in a wrong way, as a result I receive a wrong URL and therefore a 404 error.

In the attach you'll find a JUnit test.

----------------------------------------------------------------------

>Comment By: Konstantin Neubauer (sethus)
Date: 2009-11-05 13:24

Message:
thanks a lot for trying to fix it and responding, I really appreciate it,
but I have still this problem. Perhaps we dont understand each other
correctly. I am not talking about submit form, but about URL encoding. It
seems that htmlunit incorrectly encodes URLs. It should (by W3C) encode for
exapmle ö as %F6 , but instead it encodes it to  ö

quote:
"URL Encoding

URLs can only be sent over the Internet using the ASCII character-set.

Since URLs often contains characters outside the ASCII set, the URL has to
be converted. URL encoding converts the URL into a valid ASCII format.

URL encoding replaces unsafe ASCII characters with "%" followed by two
hexadecimal digits corresponding to the character values in the ISO-8859-1
character-set. ()
URLs cannot contain spaces. URL encoding normally replaces a space with a
+ sign."

i try this funktion on http://www.w3schools.com/html/html_urlencode.asp
for submitting an 'ö' and get a "%" with following hexadecimal digits
"F6"

thanks for effort
Konstantin

----------------------------------------------------------------------

Comment By: Konstantin Neubauer (sethus)
Date: 2009-07-27 10:57

Message:
Hi,
I try a test using :

<HTML>
<HEAD>
<TITLE>www.koenik.de</TITLE>
<meta http-equiv='Content-Type' content='text/html; charset=ISO-8859-1'>
</HEAD>
<BODY>
<DIV>
 <A HREF="index.php?om=&user=könik1&start=10#list">2</A>
</DIV>
 
</BODY>
</HTML>

ISO-8859-1 as charset and utf-8.
with Iso-8859-1 you will recive the correct Link from your Browser. I try
this Test with Htmlunit and the test fail.
FF3 and Chrom modify this link to :
http://localhost/index.php?om=&user=k%F6nik1&start=10#list 
htmlunit modify this link to
index.php?om=&user=könik1&start=10#list
which decoding is using for anchor?
the content as string response correct code.
using build 1518

----------------------------------------------------------------------

Comment By: Ahmed Ashour (asashour)
Date: 2009-07-24 14:19

Message:
There is an issue with FF2 as well.

Test case can be found in
http://htmlunit.svn.sf.net/viewvc/htmlunit/trunk/htmlunit/src/test/java/com/gargoylesoftware/htmlunit/html/HtmlForm2Test.java?view=markup#l_40

----------------------------------------------------------------------

Comment By: Ahmed Ashour (asashour)
Date: 2009-07-24 13:44

Message:
Hi Konstantin,

Clicking a hyperlink  is different than clicking a submit button of a
form, try different scenarios in the following case:

<html>
<head>
  <meta http-equiv='Content-Type' content='text/html; charset=UTF-8'>
</head>
<body>
  <form>
    <input name='param' value='Hello Günter'>
    <input type='submit' value='Submit'>
  </form>
  <a href='test.html?hi=Günter'>Click me</a>
</body></html>

However, there is a bug with HtmlUnit, that with FF3 simulation, it should
submit the form parameter as UTF (without any %), but I guess this is not
what think is the error.

Please use latest snapshot and advise if HtmlUnit behavior is different
than real browsers.

----------------------------------------------------------------------

Comment By: Konstantin Neubauer (sethus)
Date: 2009-07-24 12:21

Message:
Hi,
thanks for responding.
the thing is, htmlunit encode this url in unicode.
ö  is eqal to ö
but acording w3c it has to be encode as %F6 the hex value
http://www.w3schools.com/TAGS/ref_urlencode.asp



----------------------------------------------------------------------

Comment By: Ahmed Ashour (asashour)
Date: 2009-07-21 21:32

Message:
Thanks for reporting, fixed in SVN.

Please note your test case is now incorrect, as '%F6' is different than
'ö'.

Enjoy

----------------------------------------------------------------------

You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=448266&aid=2818493&group_id=47038

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
HtmlUnit-develop mailing list
HtmlUnit-develop@...
https://lists.sourceforge.net/lists/listinfo/htmlunit-develop