Single transliteration scheme for all CM languages?

Languages used in Carnatic Music & Literature
Post Reply
drshrikaanth
Posts: 4066
Joined: 26 Mar 2005, 17:01

Post by drshrikaanth »

Sorry ARun. I dont understand what combinations you are talking about. And I dont recognise n#g. n#gh, n#d, n#dh. What are these?

drshrikaanth
Posts: 4066
Joined: 26 Mar 2005, 17:01

Post by drshrikaanth »

BTW. "M" is not translating to bindu. Have you not ade the correction yet?

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

no drs. I havent fixed the bindu stuff yet (i.e. support for M).

Regarding n# - I meant #n (:mad: at myself)

Basically the nasal consonant in the same pentad as "ga" is "#n" right? But in our scheme the combination can appear as "ng" (gangai) and not always as #ng. Thats what I meant.

Arun
Last edited by arunk on 30 Nov 2006, 00:16, edited 1 time in total.

drshrikaanth
Posts: 4066
Joined: 26 Mar 2005, 17:01

Post by drshrikaanth »

arunk wrote:no drs. I havent fixed the bindu stuff yet (i.e. support for M).

Regard n# - I meant #n (:mad: at myself)
ok
Basically the nasal consonant in the same pentad as "ga" is "#n" right?
Yes
But in our scheme the combination can appear as "ng" (gangai) and not always as #ng. Thats what I meant.
Im not sure I am understanding correctly. gangai will be spelt with a bindu (gaMgai) and not with a #n in kannaDa if thats what you mean. it wont be gangai ever.

ramakriya
Posts: 1877
Joined: 04 Feb 2010, 02:05

Post by ramakriya »

arunk wrote:ok.
2. ~nc, ~nch, ~nj, ~njh (some examples please?)

Arun
pa~nce -> paMce
a~nce -> aMce
i~ncu -> iMcu
i~ncara -> iMcara
pa~nju -> paMju
bha~jana -> bhaMjana etc

drshrikaanth
Posts: 4066
Joined: 26 Mar 2005, 17:01

Post by drshrikaanth »

arunk wrote:ok. By this, I obviously misunderstood what ramakriya said.

The combinations then are:
1. #ng, #ngh, #nd, #ndh (although in our scheme these can appear as ng, ngh, nd, ndh in the input e.g. when oming from a transliteration of a tamizh krithi. For example tangam, sandi etc.)
2. ~nc, ~nch, ~nj, ~njh (some examples please?)
3. NT, NT, ND, ND
4. nt, nth, nd, ndh
5. mp, mph, mb, mbh

Arun
Arun you still have not explained in what context you are talking of these combinations.

Anyway if you are talking of where bindu occurs,

1- #nd and #ndh are nonexistent combinations.

2- pa~nca (paMca), vA~nchalya (vAMchalya), aMu, jhaMjhAvAta

3- You probably meant NT, NTh, ND, NDh

4 & 5 are fine

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

drshrikaanth wrote:Im not sure I am understanding correctly. gangai will be spelt with a bindu (gaMgai) and not with a #n in kannaDa if thats what you mean. it wont be gangai ever.
I didnt mean the kannaDa spelling - it would be with a M of course. I meant the transliteration text/input itself. For example, even if the input text you enter is amba, it becomes aMba. Same way whether the input is gan ga, or inta, they become gaMga, iMta right? It is "n" that comes with "ga" and not "#n"

Arun

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

drshrikaanth wrote:]
1- #nd and #ndh are nonexistent combinations.
Sorry I meant #nk and #nkh (same pentad as g, gh).

Arun

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

drshrikaanth wrote:3- You probably meant NT, NTh, ND, NDh
I need coffee. How many freaking mistakes can one make in a lousy post! We are doomed with me as a programmer!

Arun

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

this pa~nca is as in "five"? Can we allow panca, panja etc. to mean same? Sort of like we allowed sangam? Atleast panja is possible in tamizh words and i presume it should become paMja i think.

So panja, panja => paMca, paMja? No need to specify pa~nca, pa~nja (i think i can accept both).

Arun

ramakriya
Posts: 1877
Joined: 04 Feb 2010, 02:05

Post by ramakriya »

arunk wrote:So panja, panja => paMca, paMja? No need to specify pa~nca, pa~nja (i think i can accept both).

Arun
Yes :)

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

I have put in the changes but not yet uploaded. The changes are:
1. Added support for n2 for tamizh ந besides the existing ^n
2. Incorporated kannada rules for when we know for sure that a nasal consonant eithet becomes bindu or not.
3. Removed #ng as <=> #n in cases like vAngmaya

Here is what I get. Please check:

tirun2Amam திருநாமம் ತಿರುಮಾಮಂ
tiru^nAmam திருநாமம் ತಿರುಮಾಮಂ
vA#nmaya வாங்மய ವಾಙ್ಮಯ
a#n#nanam அங்ஙனம் ಅಙ್ಙಮಂ
vAngAdE வாங்காதே ವಾಂಗಾದೇ
gangA கங்கா ಗಂಗಾ
ga#ngA கங்கா ಗಂಗಾ
panca பஞ்ச ಪಂಚ
pa~nca பஞ்ச ಪಂಚ
panjam பஞ்சம் ಪಂಜಂ
pa~nja பஞ்ஜ ಪಂಜ (this i know is wrong for tamizh, i will fix it)
pankajam பங்கஜம் ಪಂಕಜಂ
samyukta, saMyukta சம்யுக்த, சம்யுக்த ಸಮ್ಯುಕ್ತ, ಸಂಯುಕ್ತ
imsai இம்சை ಇಂಸೈ
amba அம்ப ಅಂಬ
ammA அம்மா ಅಮ್ಮಾ
annam அன்னம் ಅಮ್ಮಂ

These are not all the use-cases but should give you an idea whether i am already off or not.

ramakriya: k + sha => ksha. For tamizh the system does it (like sri). I havent looked at this yet to fix for kannada.

Thanks again for all of your patience and valuable input

Arun
Last edited by arunk on 30 Nov 2006, 01:34, edited 1 time in total.

drshrikaanth
Posts: 4066
Joined: 26 Mar 2005, 17:01

Post by drshrikaanth »

Watch out. All "n" coming out as m in kannaDa (tirumAmam, ammam etc). Fix it.
arunk wrote:samyukta, saMyukta சம்யுக்த, சம்யுக்த ಸಮ್ಯುಕ್ತ, ಸಂಯುಕ್ತ
ಸಮ್ಯುಕ್ತ - wrong, ಸಂಯುಕ್ತ- correct

imsai இம்சை ಇಂಸೈ- sounds hottible but is correct :D
ramakriya: k + sha => ksha. For tamizh the system does it (like sri). I havent looked at this yet to fix for kannada.
I think Sh and sh are both being treated as the letter in SyAmA while S is being treated as dIkShitar. Please reverse it. But kSa is coming out as that in dIkShitar. right.

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

drshrikaanth wrote:Watch out. All "n" coming out as m in kannaDa (tirumAmam, ammam etc). Fix it.
That is a bug because of a typo. I was incorrectly generating the code for "ma" (an remnant of old logic) :)
drshrikaanth wrote:
arunk wrote:samyukta, saMyukta சம்யுக்த, சம்யுக்த ಸಮ್ಯುಕ್ತ, ಸಂಯುಕ್ತ
ಸಮ್ಯುಕ್ತ - wrong, ಸಂಯುಕ್ತ- correct
First one is wrong because transliterated text is wrong - as in samyutka is wrong, and it should only be saMyukta or is it wrong for other reasons?
drshrikaanth wrote:I think Sh and sh are both being treated as the letter in SyAmA while S is being treated as dIkShitar. Please reverse it. But kSa is coming out as that in dIkShitar. right.
Thanks. I will check this.

Arun
Last edited by arunk on 30 Nov 2006, 01:53, edited 1 time in total.

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

how about this:

sa, Sa, Sha, SyAmA, ShAmaLA, sAma:
ಸ, ಶ, ಷ, ಶ್ಯಾಮಾ, ಷಾಮಳಾ, ಸಾಮ

dIkshitar vs dIkSitar (which is correct as i think my change affected this?)
ದೀಕ್ಷಿತರ್, ದೀಕ್ಶಿತರ್

Also with my change:

SrI:
ஸ்ரீ, ಶ್ರೀ : right?

Arun
Last edited by arunk on 30 Nov 2006, 02:04, edited 1 time in total.

drshrikaanth
Posts: 4066
Joined: 26 Mar 2005, 17:01

Post by drshrikaanth »

arunk wrote:
drshrikaanth wrote:
arunk wrote:samyukta, saMyukta சம்யுக்த, சம்யுக்த ಸಮ್ಯುಕ್ತ, ಸಂಯುಕ್ತ
ಸಮ್ಯುಕ್ತ - wrong, ಸಂಯುಕ್ತ- correct
First one is wrong because transliterated text is wrong - as in samyutka is wrong, and it should only be saMyukta or is it wrong for other reasons?
It is not spelt that way. Not even in sanskrit. The second way is correct way of spelling-with a bindu

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

ok. I think that was a bad example.

How about this one:

saumyA: சௌம்யா, ಸೌಮ್ಯಾ

The tamizh seems odd for some reason!

Arun

drshrikaanth
Posts: 4066
Joined: 26 Mar 2005, 17:01

Post by drshrikaanth »

ShAmaLA, ಷಾಮಳಾ- wrong. It should be the other Sa ("Sankhada Sa" as it is called in kannaDa as it is shaped like a conch). Actually it should be SyAmaLa
dIkshitar vs dIkSitar (which is correct as i think my change affected this?)
ದೀಕ್ಷಿತರ್, ದೀಕ್ಶಿತರ್
first one is correct both in english and kannaDa.
Also with my change:

SrI:
ஸ்ரீ, ಶ್ರೀ : right?

Arun
correct.

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

ok. Thanks. I think I got the codes for sa, Sa, Sha correct. Messed up on the examples used for illustration.

I will upload it now

Arun

drshrikaanth
Posts: 4066
Joined: 26 Mar 2005, 17:01

Post by drshrikaanth »

arunk wrote:How about this one:

saumyA: சௌம்யா, ಸೌಮ್ಯಾ

The tamizh seems odd for some reason!

Arun
This is correct both in kannaDa and in tamizh.

drshrikaanth
Posts: 4066
Joined: 26 Mar 2005, 17:01

Post by drshrikaanth »

FYI In kannaDa words are never spelt with a dIrgha at the end. Not even names. so you have only akka, amma, saumya, ramya not akkA, ammA, saumyA, ramyA. In tamizh they are spelt in the latter fashion.

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

updated. Thanks for your help drs and ramakriya!

Now on to the next set of problems ;);)

Arun

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

drshrikaanth wrote:FYI In kannaDa words are never spelt with a dIrgha at the end. Not even names. so you have only akka, amma, saumya, ramya not akkA, ammA, saumyA, ramyA. In tamizh they are spelt in the latter fashion.
Ok Thanks. I was thinking about tamizh words only - the other ones i was just taking (bad) guesses.

Arun

drshrikaanth
Posts: 4066
Joined: 26 Mar 2005, 17:01

Post by drshrikaanth »

Checked. It is working fine. One observation candra and caMdra are both showing correctly in kannaDa with a bindu. But when you write camdra, it comes with the letter m and not bindu. Is that how it is meant to be?

drshrikaanth
Posts: 4066
Joined: 26 Mar 2005, 17:01

Post by drshrikaanth »

Arun. You have made sure that what you write in tamizh comes out correctly in kannaDa even with some flexibility in english spellings. But the reverse is a problem. saMgama comes out all awkward in tamizh சம்கம. What are ou going to do about it?

BTW I sent you an email earlier today. Did you see it?

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

yes intentional. With m, i am morphing to bindu only if followed by p, ph, b, bh and s (what about S, Sh? i am not now). The p, ph, b, bh is because of the home pentat. The "sa" is for tamizh (and we know it doesnt morph in kannada).

I can do for others. Please tell me which ones are safe and i can easily add them.

Arun

drshrikaanth
Posts: 4066
Joined: 26 Mar 2005, 17:01

Post by drshrikaanth »

arunk wrote:yes intentional. With m, i am morphing to bindu only if followed by p, ph, b, bh and s (what about S, Sh? i am not now). The p, ph, b, bh is because of the home pentat. The "sa" is for tamizh (and we know it doesnt morph in kannada).

I can do for others. Please tell me which ones are safe and i can easily add them.

Arun
I think you can safely make "m" as bindu for v, S, Sh, s and h in kannaDa. (Will let you know if I recall exceptions)

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

drshrikaanth wrote:Arun. You have made sure that what you write in tamizh comes out correctly in kannaDa even with some flexibility in english spellings. But the reverse is a problem. saMgama comes out all awkward in tamizh சம்கம. What are ou going to do about it?
Oh yes! I forgot. I have to add some smarts to the tamizh logic. I think this is possible as follows:

Md(h), Mt(h), Mc(h), Mj(h), Mk(h), Mg(h) => Nd(h), Nt(h) and so on. That would work right?
BTW I sent you an email earlier today. Did you see it?
Sorry havent checked email.

drshrikaanth
Posts: 4066
Joined: 26 Mar 2005, 17:01

Post by drshrikaanth »

arunk wrote:Oh yes! I forgot. I have to add some smarts to the tamizh logic. I think this is possible as follows:

Md(h), Mt(h), Mc(h), Mj(h), Mk(h), Mg(h) => Nd(h), Nt(h) and so on. That would work right?
Mt(h), Md(h)= nd, not Nd(h),. And the rest. I guess you made a typo but I want to make sure you dont make an error in the script.

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

yes typo as usual :)

But that reminds me: MD(h), MT(h) => ND(h), NT(h)

Arun

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

drs - i sent you some emails
Last edited by arunk on 30 Nov 2006, 02:42, edited 1 time in total.

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

have fixed (not uploaded) tamizh logic to handle the bindu M correctly:

nandi, naMdi, paMca, panca, pa~nca, saMgama, ghaMTa, kaMDE, amba, aMba

நந்தி, நந்தி, பஞ்ச, பஞ்ச, பஞ்ச, சங்கம, கண்ட, கண்டே, அம்ப, அம்ப

ನಂದಿ, ನಂದಿ, ಪಂಚ, ಪಂಚ, ಪಂಚ, ಸಂಗಮ, ಘಂಟ, ಕಂಡೇ, ಅಂಬ, ಅಂಬ

Arun

jayaram
Posts: 1317
Joined: 30 Jun 2006, 03:08

Post by jayaram »

drshrikaanth wrote:I think you are safer writing it as j~nAna in kannaDa as that is how kannaDigas will pronounce it even in a tamizh kRti. The problem with lwriting it as ~nAnam is that they will probably have problems recognising the sound as words never begin with ~n in kannaDa.
That's an interesting statement. The ~n sound is common in Malayalam (as in '~nAn' for 'I'). Based on the above, this word would get pronounced as 'j~nAn' by a Kannadiga! Shouldn't we try and capture the original sound to the extent possible, so that at least a multi-lingual person can pronounce it closest to the original - rather than cater just for the unilingual person? (Also, several musicians tend to be multi-lingual any way.)

drshrikaanth
Posts: 4066
Joined: 26 Mar 2005, 17:01

Post by drshrikaanth »

jayaram wrote:That's an interesting statement. The ~n sound is common in Malayalam (as in '~nAn' for 'I'). Based on the above, this word would get pronounced as 'j~nAn' by a Kannadiga! Shouldn't we try and capture the original sound to the extent possible, so that at least a multi-lingual person can pronounce it closest to the original - rather than cater just for the unilingual person? (Also, several musicians tend to be multi-lingual any way.)
2 things to consider here Jayaram. One ~nAnam is immediately recognisable as j~nAna and the familiar pronunciation takes over. Two, how many malayalam songs do you get to hear from musicians? If more were in circulation, then these sounds may be produced more accurately by non-native speakers. Just as an exampl, when I spoke malayalam, native speakers always used to chide me for saying pare~n~nu although I was oly saying para~n~nu. These things do crrep in without ones knowledge and despite the best conscious efforts.

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

i have a proposal:

How about "ei", "Ai" etc. to mean "e" + y and "A" + y. That way மெய்/தாய்மொழி can also be mei/tAimozhi besides mey/tAymozhi. In this case the mey etc. seems too far off phonetically.

I am thinking of this rule: If an "i" occurs in the middle of the word but is not part of a consonant modifier, then it is like an "y". For example, not as in mi or mai (since i and ai are consonant modifiers), but only as mei, mAi, moi etc. I would recommend judicous use of it as this would allow something like seiiuL :). We could prevent this by saying it must be end of word of must be followed by a consonant?

Thoughts?

Arun

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

drs,

If a ~nAn from a malayAlam krithi is rendered in kannada as just ~nAn would it be bad (it may look horrible as imsai but wouldnt it help in better pronounciation?).

Arun

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

the other possibility is to render it as j~nAn but put qualifiers like subscript. This is not possible now, but eventually it will be. Each language specific logic, will know what the "original language" and be able to adjust their logic if necessary

This was my thinking all along.

Arun
Last edited by arunk on 30 Nov 2006, 04:24, edited 1 time in total.

jayaram
Posts: 1317
Joined: 30 Jun 2006, 03:08

Post by jayaram »

drshrikaanth wrote:
jayaram wrote:That's an interesting statement. The ~n sound is common in Malayalam (as in '~nAn' for 'I'). Based on the above, this word would get pronounced as 'j~nAn' by a Kannadiga! Shouldn't we try and capture the original sound to the extent possible, so that at least a multi-lingual person can pronounce it closest to the original - rather than cater just for the unilingual person? (Also, several musicians tend to be multi-lingual any way.)
2 things to consider here Jayaram. One ~nAnam is immediately recognisable as j~nAna and the familiar pronunciation takes over. Two, how many malayalam songs do you get to hear from musicians? If more were in circulation, then these sounds may be produced more accurately by non-native speakers. Just as an exampl, when I spoke malayalam, native speakers always used to chide me for saying pare~n~nu although I was oly saying para~n~nu. These things do crrep in without ones knowledge and despite the best conscious efforts.
I don't quite agree. To give an analogy, English has the 'fricative' sound, e.g. the word 'the' or 'this'. The 'th' sound is not like any of the Indic sounds, but a sensitive listener can pick it up when spoken by a native English speaker and learn to reproduce it after a few tries. The key here, of course, is sensitive listener.
(Another example: some people can't pronounce the 'zh' in 'pazhaM' - instead they say 'paLaM'. Should we then do away with the 'zh' sound altogether, to cater for these folks?)

Your point about not many Malayalam songs being rendered by musicians is a bit tangential to this discussion. It is well known that fewer CM kritis exist in the language and Malayalam is a bit of a loner amongst the 4 south indian languages & gets the stepsisterly treatment anyway.

Having said all this, I thought Arun's attempt was to come up with a robust transliterator for the 5 languages, or are we not to cover Malayalam fully?
Last edited by jayaram on 30 Nov 2006, 05:07, edited 1 time in total.

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

jayaram,

the goal is to come with a transliterator for all 5 languages incl. malayALam. But what is in question is here is whether we can get proper pronounciation of any language preserved/communicated in ALL 4 languages. Doing that without ambiguity or additional interpretation is not practicallly possible (so we do the best we can - as explained below).

It is not fully possible because e.g. we cannot represent the sound of that first "na" in nanni in tamizh or kannada as no alphabet in those languages carry that sound. Now the case of ~na here may be a better match (but i will wait for drs's response as to whether a ~nA without j preceding it would have different contextual pronounciation implications). Where there are no good matches, you can only hope for approximation wherever possible. In some cases you get reasonable ones, and in some cases trying to "fa" sound (occurs in hindi (corrected)) in tamizh you pretty much fall flat (people use aHk + p, that is a very very gross approximation).

Anyway in all such cases, you either accept the approximation as good enough to need no other qualifiers, or you qualify with super-scripts. But all these require additional interpretation and so the user must be sensitive to get it right. If you want pronounciation right, you must be sensitive :). That is a given. If one cares about pronounciation and one knows it is a malayALam krithi, then one would try to get that ~nAn right no matter how it appears in that person's favorite language's script. But to atleast clue him on these "not-native" pronounciation, we put those qualifiers like say k1 (ka), 2 (kha), etc. (say in tamizh script for a non-tamizh krithi).

A rough example: We all see pArvati in transl. english and we know what it stands for even if we dont have it written in our language (because we know the scheme). But tell me if the english word "pArvati" phonetically close to the actual Indic word?

Regarding pazham and paLam - that is not a correct example. The "zha" alphabet very much exists (and besides there are many people who get it right ;)).

Arun
Last edited by arunk on 30 Nov 2006, 21:14, edited 1 time in total.

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

drshrikaanth wrote:Arun. You have made sure that what you write in tamizh comes out correctly in kannaDa even with some flexibility in english spellings. But the reverse is a problem. saMgama comes out all awkward in tamizh ?????. What are ou going to do about it?
Drs. I was thinking about this. Some points
1. The anuswara does not affect pronounciation and hence is more of a characteristic of the script.
2. The use of explicit M in some cases (where it represents non-ma sound) makes the transl. text less phonetically equivalent to the Indic word being referred.
3. Because of #2, it can mislead some people who dont know the bindu rules (i guess mostly tamilians :)) who read the transliterated text. I still think the transl. text itself is quite useful and that is why should be a fair phonetic representation whenever possible.

Based on this, wouldnt it be better if we want to recommmend against using 'M', except in places where it is absolutely necessary (and of course there it is replacing 'm' and so phonetic nature of transliterated text doesnt change)?

The program will still accept M in all applicable cases (and tamizh will come out right) but i thought just like #n, you want to use it only where it is necessary. What do you think?

Arun
Last edited by arunk on 30 Nov 2006, 06:19, edited 1 time in total.

ramakriya
Posts: 1877
Joined: 04 Feb 2010, 02:05

Post by ramakriya »

arunk wrote:drs,

If a ~nAn from a malayAlam krithi is rendered in kannada as just ~nAn would it be bad (it may look horrible as imsai but wouldnt it help in better pronounciation?).

Arun
If it was a malayALam kriti with the word ~nAn, it should be written in kannada as ~nAn as well ; Readers should be able to figure it out. It certainley should not be written as j~nAn.


-Ramakriya
Last edited by ramakriya on 30 Nov 2006, 11:50, edited 1 time in total.

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

i have some points to mention on this - but i would like to heard drs' views on this first.

Arun

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

also my point about "fa" is wrong. I was thinking Hindi and said sanskrit. I dont know if sanskrit uses it (only pha?). Of course hindi right now doest figure here - but the point i was making was there may be certain sounds in one language, which are impossible to represent in any adequate fashion in another language.

jayaram
Posts: 1317
Joined: 30 Jun 2006, 03:08

Post by jayaram »

but the point i was making was there may be certain sounds in one language, which are impossible to represent in any adequate fashion in another language.
Which is why we need a robust transliteration scheme in the first place! At the risk of sounding like a broker record, lemme give an example: the Hindi word 'nahin' (for 'no') has the peculiar sound 'hin' which is not found in, say, the 4 south indian languages. How then do you transliterate a Swati kriti in Hindi that uses this word? Do you not require a scheme to represent this? The alternative is to agree that we would only use the common sounds, but that straightaway eliminates the kha, gha etc. that don't occur in Tamil!
Tricky, I know, but that's why we need brains like yours!

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

jayaram,

you are missing the point. The sounds themselves can never be communicated without additional interpretation unless:
(a) you extend the script to introduce unique symbols for different sounds.
(b) you add extra qualifiers (like k1, k2).

Now you would think that (a) is needed only for "foreign sounds" but NO. A language's script is NOT ALWAYS fully representative of the language's OWN sounds - forget other ones. Not all scripts are phonetic (tamizh and english being prime examples) - and even when they are, they are in varying degrees.

We should not be and are not trying to extend scripts here. That is over-extending our job and it needs wider support etc. The most we will do is (b) i.e. add qualifiers.

Arun

jayaram
Posts: 1317
Joined: 30 Jun 2006, 03:08

Post by jayaram »

Agreed we can't achieve 100% perfection, but at least we can aim for the max possible.
By the way, will your scheme also provide audio representations of the written characters? I mean, can we click a button and hear the sounds themselves?

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

the transl. scheme itself on the other hand can use as many symbols as needed for varying unique sounds in all 5 languages e.g. #n, ~n etc..

But just as even here we are not extending english alphabet, but are merely adding special qualifiers to "n", when the transliteration scheme gets rendered into the 5 indic languages, they will add (different kinds) of qualifiers.

Arun

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

jayaram wrote:Agreed we can't achieve 100% perfection, but at least we can aim for the max possible.
By the way, will your scheme also provide audio representations of the written characters? I mean, can we click a button and hear the sounds themselves?
Its possible as some sort of a "legend" . But for the program to take a transl. text and do "text to speech" and get the sounds right .. Now wouldnt that be awesome :):)? But that is one tough job (must admit to tickles my curiousity ;))

Arun
Last edited by arunk on 30 Nov 2006, 21:54, edited 1 time in total.

vasanthakokilam
Posts: 10958
Joined: 03 Feb 2010, 00:01

Post by vasanthakokilam »

But for the program to take a transl. text and do "text to speech" and get the sounds right
There are some commercial packages that do a reasonably good job but they still sound mechanical and robotic. There are some R&D projects I am aware of where they attempt to faithfully reproduce the sounds. Those are not from a transliterated text but based on a lot more complicated and non-human understandeable ( but still textual ) representation. The complexity of that representation is increased a lot since the intonation and stress depends on the syntactical structure of the sentence and not just individual words or phoneme relationships.

Just a related quiz.. How many phonemes exist in English? How about our 5 languages we are talking about. I know the answer to the first one, I will reveal a bit later!! ( I heard there is an african language with 100 phonemes!! )

vgvindan
Posts: 1430
Joined: 13 Aug 2006, 10:51

Post by vgvindan »

arunk,
I dont know if sanskrit uses it (only pha?)
From what I have understood by perusing the unicode allotment scheme, Devanagi is the superset of all Indian Languages which caters for all sounds used in Indian Languages and Urdu (Urdu also an Indian Language, of course). In my opinion, it caters for even foreign language sounds like 'zha' in Russian Language similar to Tamil 'ழ'.
In any case, IMHO Sanskrit is the only language in the World which caters for WYSIWYG concept.

Post Reply