Single Transliteration Scheme for all CM Languages - Part 2

Languages used in Carnatic Music & Literature
Post Reply
arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

Dear friends,

I am starting a new thread because the other one has enough posts in it, and i consider this one as a sort of a "new coming" :). Hope mods do not mind.

I have a significant update to the single transliteration scheme project. I have now called the scheme Unified Transliteration Scheme for Carnatic Music Compositions (Yes - quite ingenious indeed ;)!)

There are some major enhancements this time around:

A full-blown editor
Note: This is in addition to the test-bed, which you can still use for basic testing.

1. There is now a full blown editor which you can access from a browser (firefox, IE have been tested with, Safari should work too). Using this you can create sahitya with nice formatting such as bold, italic, headings, colors etc. You can also copy and paste text from a web-page or from Word (although i have not tested copy/paste from Word that much). Once you have your text ready, you click one button and it will render the text in your favorite CM language.
2. You can switch to a printable view of the translation of any particular language and then print it (no "save" support yet)
3. There is support for "Variables" which allows you to specify certain terms (outside of the sahitya portions) that get translated differently for different languages so that it is appropriate for each. There is work to be done here in the sense that the "database" of known variables needs to grow. I have only a handful now. - but hopefully they allow you to get a feel for it. You can get information on how to use these Variables using the online help.

Once you know how to use the editor, a nice way to test it is to go to one of the fine rasikas.org Wiki pages with krithi listing, just select the sahitya portion (including headings), and paste it to the editor. You then ask the editor to "convert suitable terms to variables" and it should do so for raga, composer, pallavi, language, anupallavi etc. Then you translate - you may have to adjust a few things as the input text may not conform to the scheme. This also allows you to see why using variables is very useful.

Note malayalam still needs work and has been disabled in the editor. Hopefully with jayaram's help, i can complete it in the next few weeks. You can still play with malayalam in the test bed page.

The editor will take a few seconds to load. Please bear with it. I think it gets faster on subsequent usages as browser starts to reuse previously downloaded content from the cache, but it isnt going to instantaneous. This is the price we pay for a richer interface. If you have a modem connection, it could be too slow - sorry! But with cable modem connection or faster, it loads fast enough to not be a detriment.

Description of scheme
I have also created web-pages that describe the scheme in detail including all the context specific rules. This includes a overview, plus an interactive legend/index table where you can view the entire "alphabet" of the scheme. It also describes the qualifier schemes for tamil. I would very much like interested people to go over this to make sure things make sense and add up. Any feedback is really appreciated.

Links
You can read information on the transliteration scheme at: http://arunk.freepgs.com/cmtranslit/cmt ... cheme.html . If you notice, the version of the scheme says "1.0 Beta" - that means everything is not cast in stone, and we can still iron out stuff before it "officially" becomes 1.0.
You can directly access the index/legend at: http://arunk.freepgs.com/cmtranslit/legend.html
You can use the editor at: http://arunk.freepgs.com/cmtranslit
You can continue to use the test bed at: http://arunk.freepgs.com/cmtranslit/cmt ... stbed.html (old link also still works)


Hopefully fine looking sanskrit/telugu/tamil/kannada/malayalam content will be created and printed using this. If everybody accepts it, eventually I would like the Carnatic Wiki pages to somehow interface with the translator so that from the Wiki page of a krithi you click a button/link that means "i want to see this in my language" (or on all 5 languages), and it takes you to a page where the translation is already done. I think this is possible with some php stuff - although admin help would be neeeded. Something to think about.

I would like to request your support in evolving the scheme and its uses. Please give me feedback - good or bad. Suggestions are always welcome. Please post feedback on this thread or you can send me email too.

Thanks
Arun
Last edited by arunk on 27 Jan 2007, 03:54, edited 1 time in total.

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

i had some typos in the links when i originally posted and some werent working, i have corrected them now.

Arun

rshankar
Posts: 13754
Joined: 02 Feb 2010, 22:26

Post by rshankar »

WOW! Arun,
Nice work! Must have taken you loads of time. Wonderful dedication!
I had this fond hope as I was scrolling through the page that you'd have spelt the death knell for goof ups like panDu rIdi and banduvarALI...but then I realized that this works on the GIGO scheme as well, correct?

ramakriya
Posts: 1876
Joined: 04 Feb 2010, 02:05

Post by ramakriya »

Arun,

This is wonderful.. I will play around when I have some free time.

-Ramakriya

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

thanks ravi. It was a bit time consuming, although the editor is really available for free (its called tinyMCE - that reminds me that i should be acknowledging that). I had to extend it and i chose it for precisely that reason in that it is extensible.

paNDUridi will "look right" as long as you dont turn on qualifiers, but as soon as you do that, the jig could be up :). But then that is if the person inputting knows which is right and which is wrong! It is only as good as what input you provide. You give it #$@, it will translate it to #$@!

But if we have "official pages" with the correct pronounciation, and people use the translator, it should help retain the correct pronounciation (again if we use qualifiers which would be a must for non-tamizh krithis being rendered in tamizh).

What is GIGO btw? Sorry - i dont know that term.

Arun

Suji Ram
Posts: 1529
Joined: 09 Feb 2006, 00:04

Post by Suji Ram »

Awesome Arun,
Absolutely COOL!! Congrats

rshankar
Posts: 13754
Joined: 02 Feb 2010, 22:26

Post by rshankar »

GIGO was one of my dad's favvorite terms:
Garbage In = Garbage Out

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

thanks ramakriya, suji. Hopefully you guys will find time to play with it. I will try to setup some examples like last time in the next couple of days. But at this point, we need to make sure that the scheme is ok.

ravi - yep GIGO alright :)

Arun

ramakriya
Posts: 1876
Joined: 04 Feb 2010, 02:05

Post by ramakriya »

Is there a way to save all language versions into a *single* file when I hit transliterate?

-Ramakriya

Suji Ram
Posts: 1529
Joined: 09 Feb 2006, 00:04

Post by Suji Ram »

Already tested the Editor and it is great. Copied and pasted some kritis and tested and they look cool. Anything from Karnatik.com turns out very funny since the scheme is different there.

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

ramakriya - not yet. Actually saving even one isnt directly supported yet.

But i will make sure to make this possible. I did have that in mind. Even in printable view, i would like to make it possible to view multiple languages (which should make way for the save).

Arun
Last edited by arunk on 27 Jan 2007, 04:45, edited 1 time in total.

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

Suji Ram wrote:Already tested the Editor and it is great. Copied and pasted some kritis and tested and they look cool. Anything from Karnatik.com turns out very funny since the scheme is different there.
one possibility for "later on" is to implement "inter-scheme" translations - but that can be a can of worms :).

Arun

ramakriya
Posts: 1876
Joined: 04 Feb 2010, 02:05

Post by ramakriya »

Here is my first trial: What else other than an invocation to ganEsha?

ಗಜವದನ ಬೇಡುವೆ; ಪುರಂದರ ದಾಸರ ರಚನೆ

ಗಜವದನ ಬೇಡುವೆ ಗೌರೀತನಯ
ತ್ರಿಜಗವಂದಿತನೆ ಸುಜನರ ಪೊರೆವನೆ ||ಪಲ್ಲವಿ||

ಪಾಶಾಂಕುಶಧರ ಪರಮ ಪವಿತ್ರ
ಮೂಷಕ ವಾಹನ ಮುನಿಜನ ಪ್ರೇಮ || ಅನುಪಲ್ಲವಿ||

ಮೋದದಿ ನಿನ್ನಯ ಪಾದವ ತೋರೋ
ಸಾಧು ವಂದಿತನೇ ಆದರದಿಂದಲಿ || ಚರಣ 1||

ಸರಸಿಜನಾಭ ಶ್ರೀ ಪುರಂದರ ವಿಠಲನ
ನಿರುತ ನೆನೆಯುವಂತೆ ದಯಮಾಡೋ || ಚರಣ 2||


गजवदन बेडुवॆ; पुरंदर दासर रचनॆ

गजवदन बेडुवॆ गौरीतनय
त्रिजगवंदितनॆ सुजनर पॊरॆवनॆ ||पल्लवि||

पाशांकुशधर परम पवित्र
मूषक वाहन मुनिजन प्रेम || अनुपल्लवि||

मोददि निन्नय पादव तोरो
साधु वंदितने आदरदिंदलि || चरण 1||

सरसिजनाभ श्री पुरंदर विठलन
निरुत नॆनॆयुवंतॆ दयमाडो || चरण 2||

గజవదన బేడువె; పురందర దాసర రచనె

గజవదన బేడువె గౌరీతనయ
త్రిజగవందితనె సుజనర పొరెవనె ||పల్లవి||

పాశాంకుశధర పరమ పవిత్ర
మూషక వాహన మునిజన ప్రేమ || అనుపల్లవి||

మోదది నిన్నయ పాదవ తోరో
సాధు వందితనే ఆదరదిందలి || చరణ 1||

సరసిజనాభ శ్రీ పురందర విఠలన
నిరుత నెనెయువంతె దయమాడో || చరణ 2||

கஜவதன பேடுவெ; புரந்தர தாசர ரசனெ

கஜவதன பேடுவெ கௌரீதனய
த்ரிஜகவந்திதனெ சுஜனர பொரெவனெ ||பல்லவி||

பாஸாங்குஸதர பரம பவித்ர
மூஷக வாஹன முனிஜன ப்ரேம || அனுபல்லவி||

மோததி நின்னய பாதவ தோரோ
சாது வந்திதனே ஆதரதிந்தலி || சரண 1||

சரசிஜனாப ஸ்ரீ புரந்தர விடலன
நிருத நெனெயுவன்தெ தயமாடோ || சரண 2||

Very neat Arun!

-Ramakriya
Last edited by ramakriya on 27 Jan 2007, 04:49, edited 1 time in total.

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

you should be able to "convert" each caraNa to a variable (look at online help for how do - basically you place cursor on the word and click the right button).

Then even in tamizh it will come out correctly as caraNam (as opposed to caraNa).

Arun

Suji Ram
Posts: 1529
Joined: 09 Feb 2006, 00:04

Post by Suji Ram »

Ramakriya,
you did not turn on the qualifiers for tamizh..

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

right - but then there is no universal qualifier scheme. I like the "natural one" (and really detest the other), and i think there are others who may feel vice-versa :)

One more thing i was planning to implement was to way to indicate "include legend in output" - and it puts up :
(i) an indication of the qualifier scheme
(2) and/or a concise form of the legend including ONLY those letters that end up using qualifiers (as opposed to entire legend which would be impractical)

Arun
Last edited by arunk on 27 Jan 2007, 04:54, edited 1 time in total.

ramakriya
Posts: 1876
Joined: 04 Feb 2010, 02:05

Post by ramakriya »

Please take a look at the following wiki page generated by Arun's editor.

http://www.rasikas.org/wiki/sri-mahaganapatim-bhajeham

I have not yet tried using variables etc.

-Ramakriya

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

Nice. But we have some issues to deal with

1. Font sizes arent the same for all - tamizh is too big, sanskrit and telugu too small (for me on my computer). This is of course can be dependent on individual user's browser settings. In mine I believe different fonts got picked and they all have different sizes.

2 tamizh qualifiers arent showing up nicely at all - they arent super-scripted and hence are "taking over the content" and are an eye-sore.

I am guessing #2 is because of copy and paste where some HTML tags got lost - or may be they got lost on entry to the wiki (which while makes things easier, can be quite restrictive if you want to mix html)?. Perhaps an option copy contents to clipboard like testbed should solve this. It may be possible to provide "copy as wiki".

#1, i am not yet sure. One possibility is to explicitly use font sizes (8pt, 10pt etc.) and that may even things out.

Arun
Last edited by arunk on 30 Jan 2007, 01:49, edited 1 time in total.

ramakriya
Posts: 1876
Joined: 04 Feb 2010, 02:05

Post by ramakriya »

Arun,

I too find the same mismatch in the sizes (Tamil v/s Samskrita-kannada etc). I used all default settings.

Is there a paste-special mode where superscripts will remain as such?

One problem I found is that single spaces between words show up as no spaces in the transliterated text!

-Ramakriya
Last edited by ramakriya on 30 Jan 2007, 02:02, edited 1 time in total.

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

ramakriya wrote:Is there a paste-special mode where superscripts will remain as such?
How are you doing this? in wiki you edit and simply paste? Let me know and I will play around
One problem I found is that single spaces between words show up as no spaces in the transliterated text!
Where in the editor itself, or printable view or on paste in wiki?

Arun

ramakriya
Posts: 1876
Joined: 04 Feb 2010, 02:05

Post by ramakriya »

arunk wrote:
ramakriya wrote:Is there a paste-special mode where superscripts will remain as such?
How are you doing this? in wiki you edit and simply paste? Let me know and I will play around
Arun
Yes - Also when I copy-paste, some parts of the samyuktaksharas in Kannada/Telugu don't show up correctly (Look at the example given below - the 'ya' and 'ma' ottu. What to be done for this?
One problem I found is that single spaces between words show up as no spaces in the transliterated text!
Where in the editor itself, or printable view or on paste in wiki?

Arun
In both. And this is happening in Kannada and Telugu only. In SamskRta and Tamil the space shows up correctlt as shown in the following example:

gajAraNyavAraNam jyOtirmayam
गजारण्यवारणम् ज्योतिर्मयम्
గజారణ్యవారణంజ్యోతిర్మయం
ಗಜಾರಣ್ಯವಾರಣಂಜ್ಯೋತಿರ್ಮಯಂ
க3ஜாரண்யவாரணம் ஜ்யோதி1ர்மயம்ಯಂ

-Ramakriya
Last edited by ramakriya on 30 Jan 2007, 02:24, edited 1 time in total.

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

i will look at the copy-paste later when I get a chance.

The space problem, i bet has to do with my logic for the anuswara. I can reproduce it on my computer. So i will try to fix it.

Thanks!
Arun

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

i found the space problem after anuswara. It was a bug in the logic. I do not yet know what is going on with the copy and paste problem. My guess is perhaps the font being selected by the browser when viewing rasikas.org page is not a good one and is buggy?

For example, if I copy the (supposedly) correct kannada text for it from the test-bed, and paste it here in my reply, it changes to something bogus:

ಗಜಾರಣ್ಯವಾರಣಂ ಜ್ಯೋತಿರ್ಮಯಂ

The above is not what I see in the window from which i copied it to clipboard! The Nya, rma etc. are messed up.

Arun

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

i found a clue: Any time you paste into a edit-box or a text-box (like our post submission part, wiki submission) things go wacky.

Even in the test-bed, if i copy the kannada for gajAraNya from the kannada section back into the edit-box (i.e the one under Type in text to translate), it goes wacky!.

This sort of sucks. Unless we find a solution, submitting non-english content via forms has problems.

Arun

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

if i explictly setup the text-boxes to use a unicode font (e.g. Arial Unicode MS), it works. So I think we need to ask srkris to change the submission boxes to use unicode fonts.

i.e. add style="font-family:Arial Unicode MS,Lucida Sans Unicode;" to the input tags did the trick for me. Of course one has to watch out for side-effects.

Arun

ramakriya
Posts: 1876
Joined: 04 Feb 2010, 02:05

Post by ramakriya »

arun.

what are the variables that are supported now?

ramakriya
Posts: 1876
Joined: 04 Feb 2010, 02:05

Post by ramakriya »

Also, the problem with samyuktAsharas seems to be solved, Thanks Srkris. Arun.

http://www.rasikas.org/wiki/paripahimam-shri

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

ramakriya - For variables - click on the "$" button and you can see the list. It may be easier to simply type in the text as per wiki format and click on the button which has "3 arrows pointing towards a $".

How was the fix done for the samyuktAksharas?

PS: I will try to finish the next update today (which should include the spacing problem for anuswara at end).

Arun

mahakavi

Post by mahakavi »

The page does not show Thamizh version.

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

mahakavi,

Some questions so that i can figure out what the problem may be:

1. which page? The main page itself or printable view? If it is main-page, does it not show even if you click on the Tamil heading?
2. Do other languages show up?
3. Which browser (and version) are you using?

Thanks
Arun

ramakriya
Posts: 1876
Joined: 04 Feb 2010, 02:05

Post by ramakriya »

mahakavi,

if you are refering to the page I indicated ( http://www.rasikas.org/wiki/paripahimam-shri )
I have not added the tamizh version there, because I have not figured out how to make the superscripts show up as superscripts.

-Ramakriya

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

ah! that could be. BTW, the html content i generate for super-scripts do not use the <sup> tags (because i was not happy with the relative size it chose for the super-scripted content - it wasnt small enough), and that could be why it doesnt work for wiki. I could introduce an option for it to use <sup> tags and we can see if that works better.

The other option is for srkris to allow embedding html into wiki (that is a supported option of wiki). But let me play around later tonight as to whether this generating <sup> will work for us.

Arun

mahakavi

Post by mahakavi »

arunk:
I was referring to the paripAhimAm shrI kriti converted by ramakriya. I could see all other versions except Thamizh. One way out for the superscript problem would be to increase the font size for the text and/or reduce the font size for the numeral.

mahakavi

Post by mahakavi »

OK, I see you are aware of the font size and the <sup> tag problem.

mahakavi

Post by mahakavi »

Come to think of it, would it not suffice if we just bold the letter in Thamizh script for both the superscripts 3 and 4 while leaving it as such for superscripts 1 &2? This, I guess, would not present any problem in pronunciation since the distinctions in the pronunciation of individual letters would smooth out during the musical rendering. That way the distraction of reading the script due to the presence of numbers would not be there.

We need to distinguish only between maTam and maDam, gOpAlA and bAlA,, kambam and gamakam , etc.
Last edited by mahakavi on 31 Jan 2007, 21:57, edited 1 time in total.

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

I presume you mean you have suffix to differentiate between Ta and Tha as say ட1 and ட2, and a emboldened ட1 and ட2 implies Da and Dha.

It is possible, but IMO bold wont cut it for all intended purposes e.g. if it is part of a word which in its entirety is to be bold (like say part of a heading).


Arun
Last edited by arunk on 01 Feb 2007, 00:16, edited 1 time in total.

rshankar
Posts: 13754
Joined: 02 Feb 2010, 22:26

Post by rshankar »

Since we are transliterating all languages, one will need to differentiate for instance, not just kAnA from gAnA, but from khAnA and ghAnA...and so on...

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

i am starting an update now. The editor will be unavailable for a few minutes.

Arun

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

update complete. This new version has

1. Ability to select multiple languages in printable view (and control order). There is a "+" button next to the printable view. Click on it to do this. The settings are remembered across sessions.
2. In printable view, option to "export to dokuwiki". Note however that because dokuwiki ignores <sup> tags in headings, all headings will be converted as simply bold. This means that this would be good only for non-english portions of our carnatic wiki page (i.e. english one needs to have big headings etc). I think that is fine.

ramakriya - assuming this works as advertised, this should make your job easier. I did play around on then wiki site and the <sup> tags work.

I have one more update that will come in a day or so. That is the one which allows you to "convert/fix" input text to conform to scheme. My hope is that will allow people to be use some of the informal schemes and also may allow transliterating text from say karnakik.com (or Lakshman's CD).

Pl. let me know if multiple languages and also the dokuwiki works well.

Thanks
Arun

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

ramakriya, when you update the text for wiki, would it be possible to replace the "M" with the correct contextual phoneme? (e.g. saMkIrNa => sankIrNa/sa#nkIrNa).

I am quite convinced this throws many people off - so much so that i have seen more than one tamizh book of non-tamizh krithis which use ம் in places where it clearly doesnt belong in the tamizh script e.g. இம்த :). I think it would better if the transliteration script didnt use it except in places where it is mandatory (as in mya vs Mya in certain words). Would you agree?

Arun

Suji Ram
Posts: 1529
Joined: 09 Feb 2006, 00:04

Post by Suji Ram »

arunk wrote:ramakriya, when you update the text for wiki, would it be possible to replace the "M" with the correct contextual phoneme? (e.g. saMkIrNa => sankIrNa/sa#nkIrNa).

Arun
Then the bindu in Telugu and kannada will change- I checked that

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

no suji, i just now tried sankIRNa, and I get:

சங்கீ1ர்ண
ಸಂಕೀರ್ಣ
సంకీర్ణ
संकीर्ण

I think this is correct. Isnt it?

Arun

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

as i mentioned in the scheme n <=> #n (and thus M for kannada, telugu etc.) when preceding k/g etc.

Suji Ram
Posts: 1529
Joined: 09 Feb 2006, 00:04

Post by Suji Ram »

arunk wrote:no suji, i just now tried sankIRNa, and I get:

சங்கீ1ர்ண
ಸಂಕೀರ್ಣ
సంకీర్ణ
संकीर्ण

I think this is correct. Isnt it?

Arun
It's correct.
I got different. Let me try again

ramakriya
Posts: 1876
Joined: 04 Feb 2010, 02:05

Post by ramakriya »

arunk wrote:ramakriya, when you update the text for wiki, would it be possible to replace the "M" with the correct contextual phoneme? (e.g. saMkIrNa => sankIrNa/sa#nkIrNa).

I am quite convinced this throws many people off - so much so that i have seen more than one tamizh book of non-tamizh krithis which use ம் in places where it clearly doesnt belong in the tamizh script e.g. இம்த :). I think it would better if the transliteration script didnt use it except in places where it is mandatory (as in mya vs Mya in certain words). Would you agree?

Arun
Sure Arun - I will keep it in mind when doing other songs.

-Ramakriya

jayaram
Posts: 1317
Joined: 30 Jun 2006, 03:08

Post by jayaram »

Arun - great to go thru your 'new-and-improved' site!

Do you know when the Malayalam section will be set up?

mohan
Posts: 2806
Joined: 03 Feb 2010, 16:52

Post by mohan »

First time I had a look at the editor. It looks great. I wish I could read a language other than English though :)

mahakavi

Post by mahakavi »

arunk:
I did transliterate a kriti from Roman script to Thamizh and it worked fine. I copied it on to an email and sent it to a friend. She received it all as @#%$!#$%^)((*&&.
Is it because the email transmission clobbers anything written in a language other than English? How does one overcome that?

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

jayaram,

Thanks.

i will (re)summarize some questions about malayalam for you. Based on answers, i will adjust logic, and you can test it on the testbed. Once we are comfortable, enabling in the editor is no big deal.

Arun

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

thanks mohan

Post Reply