Unicode or UTF-8 encoded file support?

View: New views
4 Messages — Rating Filter:   Alert me  

Unicode or UTF-8 encoded file support?

by Eric Miao :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi list,

First, this is a great tool to simplify the documentation work.

Yet I tried asciidoc on many utf-8 files, but all failed to produce
something. Python should have included Unicode and many
codec support, I'm just wondering how much effort there will
be to support unicode and utf-8 encoding for asciidoc?

--
Cheers
- eric
_______________________________________________
asciidoc-discuss mailing list
asciidoc-discuss@...
http://lists.metaperl.com/cgi-bin/mailman/listinfo/asciidoc-discuss

Re: Unicode or UTF-8 encoded file support?

by Yannick Gingras :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

"eric miao" <eric.y.miao@...> writes:

> Yet I tried asciidoc on many utf-8 files, but all failed to produce
> something. Python should have included Unicode and many
> codec support, I'm just wondering how much effort there will
> be to support unicode and utf-8 encoding for asciidoc?

Unicode is a top priority of the 9.0 line.  Can you share a file that
failed to produce output?

--
Yannick Gingras
_______________________________________________
asciidoc-discuss mailing list
asciidoc-discuss@...
http://lists.metaperl.com/cgi-bin/mailman/listinfo/asciidoc-discuss

Re: Unicode or UTF-8 encoded file support?

by Eric Miao :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Yannick Gingras wrote:

> "eric miao" <eric.y.miao@...> writes:
>
>  
>> Yet I tried asciidoc on many utf-8 files, but all failed to produce
>> something. Python should have included Unicode and many
>> codec support, I'm just wondering how much effort there will
>> be to support unicode and utf-8 encoding for asciidoc?
>>    
>
> Unicode is a top priority of the 9.0 line.  Can you share a file that
> failed to produce output?
>
>  
As in the attached file. It comes from the 8.2.4 asciidoc release.
I modified the first line into chinese using UTF-8 encoding. Edited
with vim for windows.

Thanks!

Cheers
- eric

AsciiDoc 常见问题及回答
=======================

An embryonic AsciiDoc FAQ.


== How can I include non-breaking space characters?

The predefined `\{nbsp}` attribute reference will be replaced by a
non-breaking space character.


== How do I include spaces in URL addresses?

URL inline macro targets (addresses) cannot contain white space
characters. If you need spaces encode them as `%20`. For example:

  image:large%20image.png[]
  http://www.foo.bar.com/an%20example%20document.html[]


== How can I get AsciiDoc to assign the correct DocBook language attribute?

Set the AsciiDoc 'lang' attribute to the appropriate language code.
For example:

  $ a2x -a lang=es doc/article.txt

This will ensure that downstream DocBook processing will generate the
correct language specific document headings (things like table of
contents, revision history, figure and table captions, admonition
captions).


== Why does AsciiDoc give me a ``malformed author'' error?

This is normally because there are more than three names (up to three
are expected: first name, middle name and last name). For example,
this author line would result in an error:

  Vincent Willem van Gogh

You can enter multi-word first, middle and last names in the author
line using the underscore as a word separator. For example:

  Vincent Willem van_Gogh

You could also resolve the problem by replacing the author line with
explicit attribute entries:

  :First name:  Vincent
  :Middle name: Willem
  :Last name:   Van Gogh


== How can I assign multiple author names?

A quick way to do this is put both authors in a single first name, for
example:

  My Document
  ===========
  :Author: Bill_and_Ben_the_Flowerpot_Men
  :Author Initials: BB & BC

asciidoc(1) replaces the underscores with spaces.

The longer, but semantically correct way, is to override the
`[header]` configuration file section in a document specific `.conf`
file. For example if your document is `mydoc.txt` then a file called
`mydoc.conf` in the document directory would be picked up
automatically by asciidoc(1).  Copy and paste the default
`docbook.conf` file `[header]` to `mydoc.conf` and modify the author
related markup:

  [header]
    :
  <authorgroup>...
    :



== How can I escape AsciiDoc markup?

Most AsciiDoc inline elements can be suppressed by preceding them with
a backslash character. These elements include:

- Attribute references.
- Text formatting.
- Quoting,
- 'URLs', 'image' and 'link' macros.
- Replacements.
- Special words.

In some cases you may need to escape both left and right quotes (see
the 'AsciiDoc User Guide').


== How can I escape a labeled list entry?

Two colons or semicolons in a paragraph may be confused with a labeled
list entry. Use the predefined `\{two_colons}` and `\{two_semicolons}`
to suppress this behavior, for example:

  Qui in magna commodo{two_colons} est labitur dolorum an. Est ne
  magna primis adolescens.

Will be rendered as:

Qui in magna commodo{two_colons} est labitur dolorum an. Est ne
magna primis adolescens.


== How can I disable a quoted text substitution?

Omitting the tag will disable quoting. For example, if you don't want
superscripts or subscripts then put the following in a custom
configuration file or edit the global `asciidoc.conf` configuration
file:

---------------------------------------------------------------------
[quotes]
^=
~=
---------------------------------------------------------------------


== I have a paragraph containing some funky URLs, is the a way to suppress AsciiDoc substitutions in the URL address?

You can selectively choose which substitutions to perform by setting
the 'subs' attribute at the start of a block. For example:

---------------------------------------------------------------------
[subs="macros"]
~subscripts~ and ^superscripts^ quotes won't be substituted.
Nor will the non-alphanumeric characters in the following URL:
http://host/~user/file#_anchor_tag_str_[]
---------------------------------------------------------------------


== How can I customize the \{localdate} format?

The default format for the `\{localdate}` attribute is the ISO 8601
`yyyy-mm-dd` format. You can change this format by explicitly setting
the `\{localdate}` attribute. For example by setting it using the
asciidoc(1) `-a` command-line option:

  $ asciidoc -a localdate=`date +%d-%d-%Y` mydoc.txt

You could also set it by adding an Attribute Entry to your souce
document, for example:

  :localdate: {sys: date +%Y-%m-%d}

Since it's set using an executable attribute you'll also need to
include the `--unsafe` option when you run asciidoc).


== Why doesn't AsciiDoc support strike through text?

The reason it's not in the distribution is that DocBook does not have
provision for strike through text and one of the AsciiDoc design goals
is that AsciiDoc markup should be applicable to all output formats.

Strike through is normally used to mark deleted text -- a more
comprehensive way to manage document revisions is to use a version
control system such as Subversion. You can also use the AsciiDoc
'CommentLines' and 'CommentBlocks' to retain revised text in the
source document.

If you really need strike through text for (X)HTML outputs then adding
the following to a configuration file will allow you to quote strike
through text with hyphen characters:

---------------------------------------------------------------------
 ifdef::basebackend-html[]

 [quotes]
 -=strikethrough

 [tags]
 strikethrough=<span style="text-decoration: line-through;">|</span>

 endif::basebackend-html[]
---------------------------------------------------------------------


== Where can I find examples of commands used to build output documents?

The User Guide has some. You could also look at `./doc/main.aap` in
the AsciiDoc distribution, it has all the commands used to build the
AsciiDoc documentation (even if you don't use A-A-P you'll still find
it useful).


== How can I place a backslash character in front of an attribute reference without escaping the reference?

Use the predefined `\{backslash}` attribute reference instead of an
actual backslash, for example if the `\{projectname}` attribute has
the value `foobar` then:

  d:\data{backslash}{projectname}

would be rendered as:

  d:\data\foobar


== Why have you used the DocBook <simpara> element instead of <para>?

`<simpara>` is really the same as `<para>` except it can't contain
block elements which more closely matches the AsciiDoc paragraph
semantics.

_______________________________________________
asciidoc-discuss mailing list
asciidoc-discuss@...
http://lists.metaperl.com/cgi-bin/mailman/listinfo/asciidoc-discuss

Parent Message unknown Re: Unicode or UTF-8 encoded file support?

by Eric Miao :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Ok,

I'll speak in english for the sake of our non-chinese speaking friends :-)

Well, from the editor's point of view, the chinese characters will
occupy 2 ASCII width(s),
while people normally make the ruler below same size as the previous line.

E.g.

MMNNMMNNMMNN
===============

MM or NN stands for a single chinese character, if asciidoc reads it correctly
(I mean in unicode with utf-8 as the codec), the actual size of the above
wide string (or unicode string, whatever) is 6, while the "=" symbol below
that is 12, so it should really satisfy the potential requirement of the number
of "=" symbol in the ruler below being greater than the string size above it.

But if the asciidoc does not read this correctly, the UTF-8 decoding of a
chinese character will normally be 3 bytes, and the above example will
turn out to be 18 bytes, thus making asciidoc believes the number of
"=" symbol is not enough.

Ignoring UTF-8 encoding in the input file could be a disaster in some
cases, playing tricks may solve some issue, but not all. So I really
hope the unicode be supported in the future, especially that python
now already comes with good unicode support.


2008/2/19 Hick <hick@...>:

> hehe, maybe I know what's the reason.
>
> Just reduce the chinese character or make the underline longer.
>
>
>
> ------- in chinese
>
> 我开始也遇到了这个问题, 好象是因为"分级标题长度"跟标记的下划线长度必须有一定的关系.  算英文字符的话, 好象是下划线长度必须不小于标题长度.
> 而一个中文字符算俩英文.
>
> 下划线太长了不爽, 我后来使用 = 标题 = 这种方式就没有问题了
>
> ------------------ 原始邮件 ------------------
>
> 发件人: "Eric Miao"<eric.y.miao@...>;
> 发送时间: 2008年2月19日(星期二) 下午05:54
> 收件人: "Yannick Gingras"<ygingras@...>;
> 抄送: "asciidoc-discuss"<asciidoc-discuss@...>;
> 主题: Re: [asciidoc-discuss] Unicode or UTF-8 encoded file support?
>
>
>
> Yannick Gingras wrote:
> > "eric miao" <eric.y.miao@...> writes:
> >
> >
> >> Yet I tried asciidoc on many utf-8 files, but all failed to produce
> >> something. Python should have included Unicode and many
> >> codec support, I'm just wondering how much effort there will
> >> be to support unicode and utf-8 encoding for asciidoc?
> >>
> >
> > Unicode is a top priority of the 9.0 line.  Can you share a file that
> > failed to produce output?
> >
> >
> As in the attached file. It comes from the 8.2.4 asciidoc release.
> I modified the first line into chinese using UTF-8 encoding. Edited
> with vim for windows.
>
> Thanks!
>
> Cheers
> - eric
>
>



--
Cheers
- eric
_______________________________________________
asciidoc-discuss mailing list
asciidoc-discuss@...
http://lists.metaperl.com/cgi-bin/mailman/listinfo/asciidoc-discuss