Extract CJK text from PDF

View: New views
2 Messages — Rating Filter:   Alert me  

Extract CJK text from PDF

by Wilton K W Kwok :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.
Dear all,

I need to write a program to extract Chinese (or CJK) text from pdf files. I use GetPageContent() function to extract text, it works perfectly for English text. But when I use the same piece of code to extract CJK content, it return messy code. I think this is the problem of extract CID or Unicode font. The attached file contains the Chinese Characters. Thank you.
 
Best regards,
Wilton K. W. Kwok

Email: wiltonkkw@...

Instant Messenger:
MSN: wiltonkkw@...
Skype: wiltonkkw




------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
itextsharp-questions mailing list
itextsharp-questions@...
https://lists.sourceforge.net/lists/listinfo/itextsharp-questions

manulife08a.pdf (861K) Download Attachment

Re: Extract CJK text from PDF

by Paulo Soares-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Text extraction is not supported in any way in iTextSharp. You're on your own.

Paulo

> -----Original Message-----
> From: Wilton K W Kwok [mailto:wiltonkkw@...]
> Sent: Wednesday, September 09, 2009 4:59 AM
> To: itextsharp-questions@...
> Subject: [itextsharp-questions] Extract CJK text from PDF
>
> Dear all,
>
> I need to write a program to extract Chinese (or CJK) text
> from pdf files. I use GetPageContent() function to extract
> text, it works perfectly for English text. But when I use the
> same piece of code to extract CJK content, it return messy
> code. I think this is the problem of extract CID or Unicode
> font. The attached file contains the Chinese Characters. Thank you.
>  
> Best regards,
> Wilton K. W. Kwok
>
> Email: wiltonkkw@...
>
> Instant Messenger:
> MSN: wiltonkkw@...
> Skype: wiltonkkw
>
>
>
>
Aviso Legal:
Esta mensagem é destinada exclusivamente ao destinatário. Pode conter informação confidencial ou legalmente protegida. A incorrecta transmissão desta mensagem não significa a perca de confidencialidade. Se esta mensagem for recebida por engano, por favor envie-a de volta para o remetente e apague-a do seu sistema de imediato. É proibido a qualquer pessoa que não o destinatário de usar, revelar ou distribuir qualquer parte desta mensagem.

Disclaimer:
This message is destined exclusively to the intended receiver. It may contain confidential or legally protected information. The incorrect transmission of this message does not mean the loss of its confidentiality. If this message is received by mistake, please send it back to the sender and delete it from your system immediately. It is forbidden to any person who is not the intended receiver to use, distribute or copy any part of this message.



------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
itextsharp-questions mailing list
itextsharp-questions@...
https://lists.sourceforge.net/lists/listinfo/itextsharp-questions