converting Tamil .pdf to .rtf or .doc

To teach and learn Indian classical music
Post Reply
vainika
Posts: 433
Joined: 03 Feb 2010, 11:32

converting Tamil .pdf to .rtf or .doc

Post by vainika »

This is not a query on music learning. However, those who have been involved in translation/transcription of of music notes may have experience in this sort of thing and be able to assist.

A document was created in Tamil and edited by a bunch of us using Google docs. It was downloaded as .pdf using Google's pdf conversion function. Now the online editable Google doc version has been lost (the person who had originally created and shared the Google doc deleted their account) and with it all the changes made following the last download.

We now want to be able to make edits to the content, that now only exists in the .pdf version. None of the standard pdf to Word converters work. Google gives you the option to upload a .pdf and convert it into an editable Google doc, but Tamil is not one of the languages it supports.

Would anyone have ideas to share? If you think it is not of general-enough interest please email me separately at LRamakrishnan.lists@gmail.com

thanks a lot,

vainika/LRamakrishnan

mohan
Posts: 2807
Joined: 03 Feb 2010, 16:52

Re: converting Tamil .pdf to .rtf or .doc

Post by mohan »

Have you tried converting it with Adobe Acrobat (full version)?

vainika
Posts: 433
Joined: 03 Feb 2010, 11:32

Re: converting Tamil .pdf to .rtf or .doc

Post by vainika »

mohan wrote:Have you tried converting it with Adobe Acrobat (full version)?
Yes, Mohan. The .pdf file has embedded the fonts, so there isn't any chance that the text can be extracted using Acrobat. Acrobat does not have support for Tamil. As of now the only option (one that was suggested today) appears to be printing the document and scanning it into a machine that has Tamil fonts and Unicode support.

vasanthakokilam
Posts: 10956
Joined: 03 Feb 2010, 00:01

Re: converting Tamil .pdf to .rtf or .doc

Post by vasanthakokilam »

Ramakrishnan: Open it with Adobe Acrobat, and look at 'Properties' under 'File'.
Under the 'Fonts' tab in Acrobat, you should see that all of your fonts have been embedded (e.g. Embedded Subset).
May be that will give you some ideas on what fonts to bring in to Word.

Also, found this thread from Adobe forum which also talks about the OCR method you mention: http://forums.adobe.com/thread/427945

If I think of anything else, I will post. I am sure you have asked google docs support people and forums for help. If not, that is another avenue. They should at least provide a way to upload the PDF to an editable format. ( we sure can wish, can't we!! )

Your experience illustrates a problem with this collaborative editing and cloud syncing which we have to make adequate 'backup' measures of. It seems the usual notion of 'Cloud equals no backup needed' is a false notion in these circumstances.

ShrutiLaya
Posts: 225
Joined: 14 Sep 2008, 01:15

Re: converting Tamil .pdf to .rtf or .doc

Post by ShrutiLaya »

I once tried to do this for Telugu, but gave up. The technical problem is described at http://blogs.adobe.com/insidepdf/2008/0 ... files.html but in simple terms, it's unlikely to work ..

- Sreenadh

Post Reply