Hi List,
The problem of invalid characters from MS Word has plagued me for
years. Because of the nature of our web app, users are very apt to
try to copy and paste from word. Its not just curly quotes, its math
symbols (such as greater than or equal to or pretty much anything you
might find in an algebraic equation).
I've used the test source code that Bil was kind enough to offer in
this thread (the emdash example) and everything I've pasted into the
textarea comes back fine.
I built a new MySQL db using phpMyAdmin, it says the database is
using utf8_unicode_ci collation, and the only table inside also
reports utf8_unicode_ci. The varchar field in the table is also
labeled as utf8_unicode_ci collation.
In Lasso 8.5 SiteAdmin, I used the "Table Batch Change" to insure the
UTF-8 encoding. Then verified that in the table listing - table
detail the encoding is UTF-8 (Unicode). Lasso is running on a Unix
Box (CentOS 5, I believe) as is the MySQL db.
I added inlines to Bil's page, and immediately the output contains
nasty boxes instead of the proper characters. Here is the altered
source:
[
lp_header_serveHTML('UTF-8');
var('text') = action_param('text')->trim&->substring(1,10);
if($text->Size);
var('sql')="Insert into wordchars (thestring) values ('" +
(Encode_SQL:$text) + "');";
inline(-database="curricul_wordchars", -table="wordchars", -sql=($sql));
/inline;
/if;
var('emdash') = bytes;
$emdash->import8bits(226);
$emdash->import8bits(128);
$emdash->import8bits(148);
]
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "
http://www.w3.org/
TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Charset Demo</title>
</head>
<body style="background: white;">
<form method="post">
Paste some word chars like this dash "[$emdash]"<br><textarea
name="text"></textarea>
<input type="submit" name="submit" value="submit">
</form>
[if($text->size);
'<hr>You pasted:<br><br>'+encode_html($text)+'<br>';
lp_string_radixprint($text);
var('sql')="Select * from wordchars order by createdate desc;";
inline(-database="curricul_wordchars", -table="wordchars", -sql=($sql));
"<hr>The db output:<br><br>"+(encode_html(field('thestring')))+"<br>";
/inline;
/if]
</body>
</html>
The pre-db output is still fine. Its only the encode_html(field
('thestring')) output that has the bad chars. Where is my missing step?
Todd V
--
This list is a free service of LassoSoft:
http://www.LassoSoft.com/Search the list archives:
http://www.ListSearch.com/Lasso/Browse/Manage your subscription:
http://www.ListSearch.com/Lasso/