Outputting extended ascii characters in Hadoop?

View: New views
2 Messages — Rating Filter:   Alert me  

Outputting extended ascii characters in Hadoop?

by Mark Kerzner-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,
the strings I am writing in my reducer have characters that may present a
problem, such as char represented by decimal 254, which is hex FE. It seems
that instead I see hex C3, or something else is messed up. Or my
understanding is messed up :)

Any advice?

Thank you,
Mark

Re: Outputting extended ascii characters in Hadoop?

by Todd Lipcon-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Mark,

If you're using TextOutputFormat, it assumes you're dealing in UTF8. Decimal
254 wouldn't be valid as a standalone character in UTF8 encoding.

If you're dealing with binary (ie non-textual) data, you shouldn't use
TextOutputFormat.

-Todd

On Fri, Oct 9, 2009 at 3:09 PM, Mark Kerzner <markkerzner@...> wrote:

> Hi,
> the strings I am writing in my reducer have characters that may present a
> problem, such as char represented by decimal 254, which is hex FE. It seems
> that instead I see hex C3, or something else is messed up. Or my
> understanding is messed up :)
>
> Any advice?
>
> Thank you,
> Mark
>