Single transliteration scheme for all CM languages?
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
Folks,
I have added support for telugu. It basically follows the same logic as kannada. Looking at the unicode map, i see there are some differences in terms of a couple of vowels and the "fa" consonant. For "fa" i am ignoring for telugu but those other vowels are not yet supported on kannada itself.
Can people knowledgeable in telugu pl. check it?
Also, this update has "smart logic" for tamizh to morph explicit bindu to approp consonant.
Thanks
Arun
I have added support for telugu. It basically follows the same logic as kannada. Looking at the unicode map, i see there are some differences in terms of a couple of vowels and the "fa" consonant. For "fa" i am ignoring for telugu but those other vowels are not yet supported on kannada itself.
Can people knowledgeable in telugu pl. check it?
Also, this update has "smart logic" for tamizh to morph explicit bindu to approp consonant.
Thanks
Arun
Last edited by arunk on 01 Dec 2006, 22:07, edited 1 time in total.
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
For people who havent seen this yet, the URL again is http://arunk.freepgs.com/cmtranslit/cmt ... _test.html
Arun
Arun
-
- Posts: 4066
- Joined: 26 Mar 2005, 17:01
What difference Arun? If you are talking about the way they look, yes there are diffrences in a lot of letters although the blueprint is of kannaDa only.arunk wrote:Looking at the unicode map, i see there are some differences in terms of a couple of vowels
Can you please explain what you mean? There are no more vowels in telugu than in kannaDa.but those other vowels are not yet supported on kannada itself.
-
- Posts: 4066
- Joined: 26 Mar 2005, 17:01
If these are the vowles you are talking about-
Row 1 Column oco, that is the candrabindu(half anuswAra) which is almost nonexistent in presentday telugu writing.
Row 7 column 0c0 that is lR, again never used even in the past (It is a sanskirt letter that is extremely rare even in sanskrit)
Row 0, column oc6, that is RR (dIrgha of the vowel r, never used)
Row 1, column oc6, that is LRR (dIrgha of the vowel lR, never used)
The first one, you may try and represent if it is supported in the font. the other 3, you may forget.
Row 1 Column oco, that is the candrabindu(half anuswAra) which is almost nonexistent in presentday telugu writing.
Row 7 column 0c0 that is lR, again never used even in the past (It is a sanskirt letter that is extremely rare even in sanskrit)
Row 0, column oc6, that is RR (dIrgha of the vowel r, never used)
Row 1, column oc6, that is LRR (dIrgha of the vowel lR, never used)
The first one, you may try and represent if it is supported in the font. the other 3, you may forget.
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
drs -
I meant i saw more vowels in kannada. See vocalic RR in http://www.unicode.org/charts/PDF/U0C80.pdf. IIRC i think this isnt needed for us.
Even the fa I think is NOT there in official. It is there in
http://tlt.psu.edu/suggestions/international/by language/kannadachart.html but i see now that this was probably an early proposal. So i will remove it.
I was only pointing these out to say that i didnt know if there were differences which would be significant in our context.
If there are none (besides of course the script looking different
), we are of course fine.
Arun
I meant i saw more vowels in kannada. See vocalic RR in http://www.unicode.org/charts/PDF/U0C80.pdf. IIRC i think this isnt needed for us.
Even the fa I think is NOT there in official. It is there in
http://tlt.psu.edu/suggestions/international/by language/kannadachart.html but i see now that this was probably an early proposal. So i will remove it.
I was only pointing these out to say that i didnt know if there were differences which would be significant in our context.
If there are none (besides of course the script looking different

Arun
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
our posts crossed again. I did mean those. I think we can ignore all these for our purposes.
Now on to the (i think) much more challenging devanagiri for sanskrit. Can you briefly point where kannada/telugu logic can be used for sanskrit (bindu in the middle of words), and where it cannot? Or you can wait for me to put basic support and we can correct things like we did for kannada.
Finally, we will tackle malayalam. It worries me - e.g. it doesnt use bindu like tamizh which is ok, but being much closer to sanskrit, it could have more "m" + consonant combinations which may not unambigously require bindu/or-not in kannada ii.e. more frequent "m" + "ya" combinations which would then warrant explicit M usage to get it right for other languages. We thought since they would be rare in tamizh, it was ok. But maybe they arent rare in malayalam. But i dont know what else we can do.
Thanks
Arun
Now on to the (i think) much more challenging devanagiri for sanskrit. Can you briefly point where kannada/telugu logic can be used for sanskrit (bindu in the middle of words), and where it cannot? Or you can wait for me to put basic support and we can correct things like we did for kannada.
Finally, we will tackle malayalam. It worries me - e.g. it doesnt use bindu like tamizh which is ok, but being much closer to sanskrit, it could have more "m" + consonant combinations which may not unambigously require bindu/or-not in kannada ii.e. more frequent "m" + "ya" combinations which would then warrant explicit M usage to get it right for other languages. We thought since they would be rare in tamizh, it was ok. But maybe they arent rare in malayalam. But i dont know what else we can do.
Thanks
Arun
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
Folks,
I have added initial support for Sanskrit and Malayalam. I am quite sure it is wrong on many fronts - apologies in advance. But we need your help in fixing the problems!.
For malayalam, i first want to see if it gets sounds that are common with other languages before handling ones specific to it. Currently both sanskrit and malayalam are treated sort of like kannada, telugu - but with some variations.
Please give me feedback!
BTW, the zha sound in malayalam and tamizh. I know devanagiri has a symbol for it, but what to do in kannada and telugu? The unicode map for these languages dont have a symbol for it. We need to show it somehow - e.g. use some other symbol.
Thanks
Arun
I have added initial support for Sanskrit and Malayalam. I am quite sure it is wrong on many fronts - apologies in advance. But we need your help in fixing the problems!.
For malayalam, i first want to see if it gets sounds that are common with other languages before handling ones specific to it. Currently both sanskrit and malayalam are treated sort of like kannada, telugu - but with some variations.
Please give me feedback!
BTW, the zha sound in malayalam and tamizh. I know devanagiri has a symbol for it, but what to do in kannada and telugu? The unicode map for these languages dont have a symbol for it. We need to show it somehow - e.g. use some other symbol.
Thanks
Arun
-
- Posts: 1317
- Joined: 30 Jun 2006, 03:08
Arun - quick feedback on Malayalam. I loaded the bhajarE rE example in your testbed and looked at the Malayalam fonts. The words ending in the 'm' sound are represented by 'm' + <crescent moon> for Malayalam. This is wrong, instead it should be the 'o' symbol (OD66 code)
Will give more detailed feedback later, but this one jumped out straightaway.
Will give more detailed feedback later, but this one jumped out straightaway.
Last edited by jayaram on 02 Dec 2006, 02:24, edited 1 time in total.
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
jayaram,
From atleast one example on baraha, it looks like malayalam uses similar bindu/anuswara rules like kannada/telugu. Their example is not great as it is chandigarh
, but they spell it like Mdigarh, where instead of n you got the "o". I presume this is right?
Kannada uses this for "n" in many other cases: nd, ndh, nt, nth, ng (like sangam), ngh, nk, (like sankalpam), nkh, nc, nch, nj, njh as discussed earler. It also uses it instead of "ma" in other combinations.
Now from your question in another thread, it looked like it didnt use it in many cases. Can you tell me for which combinations malayalam use this bindu/anuswara?
Thanks
Arun
From atleast one example on baraha, it looks like malayalam uses similar bindu/anuswara rules like kannada/telugu. Their example is not great as it is chandigarh

Kannada uses this for "n" in many other cases: nd, ndh, nt, nth, ng (like sangam), ngh, nk, (like sankalpam), nkh, nc, nch, nj, njh as discussed earler. It also uses it instead of "ma" in other combinations.
Now from your question in another thread, it looked like it didnt use it in many cases. Can you tell me for which combinations malayalam use this bindu/anuswara?
Thanks
Arun
-
- Posts: 1317
- Joined: 30 Jun 2006, 03:08
Malayalam is a bit of a tricky customer in this respect. We have both 'n' and 'o', and there are complex rules on when to use which one. Let's take an example:
ambu (meaning arrow) is written as a+n+p+<bindu>
while
ambujam (the name) is written as a+o+bu+ja+o
(This example also addresses your question about bindu, by the way.)
Now to the 'rule' on when to use what: i was going to say 'o' is used with the soft 'ba' but in all other cases it's the 'n' - but i just realized 'o' is also used with 'bha' e.g. 'rambha' is written as 'ra+o+bha'. Need to check if it's ever used with 'pa'...
Give me some more time and I can come up with a more complete answer.
ambu (meaning arrow) is written as a+n+p+<bindu>
while
ambujam (the name) is written as a+o+bu+ja+o
(This example also addresses your question about bindu, by the way.)
Now to the 'rule' on when to use what: i was going to say 'o' is used with the soft 'ba' but in all other cases it's the 'n' - but i just realized 'o' is also used with 'bha' e.g. 'rambha' is written as 'ra+o+bha'. Need to check if it's ever used with 'pa'...
Give me some more time and I can come up with a more complete answer.
Last edited by jayaram on 02 Dec 2006, 04:09, edited 1 time in total.
-
- Posts: 1529
- Joined: 09 Feb 2006, 00:04
Arun,
I looked at the telugu script.
The nIm in first line appears different from the nIm in second line.
The S in Sakti in telugu line 9 should be different -like the one for Sri
I am not an expert on telugu but I looked up Sakti elsewhere in a telugu site and it is different than the one you have given.
I looked at the telugu script.
The nIm in first line appears different from the nIm in second line.
The S in Sakti in telugu line 9 should be different -like the one for Sri
I am not an expert on telugu but I looked up Sakti elsewhere in a telugu site and it is different than the one you have given.
Last edited by Suji Ram on 02 Dec 2006, 05:06, edited 1 time in total.
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
Is a +n + p + <bindu> pronounced "amb" or "anp"? So bindu is sometimes used at end? There is no bindu here for "n", and the bindu usage for "m" in ambujam and rambha seem consistent with kannada/telugu. Please look back at drs post on bindu rules for kannada and pentats (same apply to telugu) and can you check if they apply for malayalam as well?
BTW, i read up some on use of cillu forms and the problems they create(d) in unicode. Please see http://std.dkuug.dk/jtc1/sc2/wg2/docs/N3126.pdf. This was a proposal to represent them as separate chars and they are accepted, and they also seem to be implemented (albeit not cleanly) in the couple of fonts I tried. In that document it talks about van_yavanika (meaning big curtain) and vanyavanika (meaning wild forests). The first one uses cillu form for "n" and the second doesnt. I presume that these are pronounced differently? I know it is hard to communicate without audio cues but if different, how it is so? I know you said some for of "n" was missing - is that the cillu form?
Thanks
Arun
BTW, i read up some on use of cillu forms and the problems they create(d) in unicode. Please see http://std.dkuug.dk/jtc1/sc2/wg2/docs/N3126.pdf. This was a proposal to represent them as separate chars and they are accepted, and they also seem to be implemented (albeit not cleanly) in the couple of fonts I tried. In that document it talks about van_yavanika (meaning big curtain) and vanyavanika (meaning wild forests). The first one uses cillu form for "n" and the second doesnt. I presume that these are pronounced differently? I know it is hard to communicate without audio cues but if different, how it is so? I know you said some for of "n" was missing - is that the cillu form?
Thanks
Arun
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
Thanks. I see this is because there is no space after the M in the second case (it is m|| rather than m ||). So the engine doesnt think it is at the of a word. It does know to recognize punctuation, but i should add | as well for cm context.Suji Ram wrote:Arun,
I looked at the telugu script.
The nIm in first line appears different from the nIm in second line.
Hmm.. Let me check my cm books which have telugu in them.The S in Sakti in telugu line 9 should be different -like the one for Sri
I am not an expert on telugu but I looked up Sakti elsewhere in a telugu site and it is different than the one you have given.
Thanks!
Arun
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
sujiram,
I have the following:
sa, Sa, Sha:
స, శ, ష
Which one figures here in Sakthi? I believe it should have been Sa (i.e. Sakthi) looking at a book I have on Syama Sastry krithis.
Sakthi:
शक्थि (sanskrit)
శక్థి (telugu)
ಶಕ್ಥಿ (kannada)
So i think this is just a transliteration error. I basically pretty much took what ramakriya had entered and he was using sh to mean this character - we have since then settled on sa, Sa, Sha.
I will fix the example when i get a chance.
Thanks
Arun
I have the following:
sa, Sa, Sha:
స, శ, ష
Which one figures here in Sakthi? I believe it should have been Sa (i.e. Sakthi) looking at a book I have on Syama Sastry krithis.
Sakthi:
शक्थि (sanskrit)
శక్థి (telugu)
ಶಕ್ಥಿ (kannada)
So i think this is just a transliteration error. I basically pretty much took what ramakriya had entered and he was using sh to mean this character - we have since then settled on sa, Sa, Sha.
I will fix the example when i get a chance.
Thanks
Arun
-
- Posts: 1529
- Joined: 09 Feb 2006, 00:04
No problem, I am waiting to see a tamizh kriti in telugu script using this scheme. I was once mediating between a tamilian and telugu in writing a tamizh kriti in telugu script- and that was a nightmare.arunk wrote:sorry - i should have entered Sakti and not Sakthi (my tamizh influence havinng a bad effect).
Arun

-
- Posts: 1317
- Joined: 30 Jun 2006, 03:08
Arun, you must keep in mind that I grew up in Kerala before the 'new' Malayalam script was introduced, so still feel uncomfortable reading the new script. (I still hunt for the old editions of books so that I don't have to torture myself with the new script!) Essentially, the changes have been made to make it easier to type and print stuff, so they have changed most of the 'joint' letters, e.g. a word like 'patRi' used to be written as: pa + tR + i. Today I think it's written as pa + t + bindu + R + i - in the first case the 'tR' was a single unit, written as a single letter.
In the 'van_yavanika' vs 'vanyavanika' example: the former is really the two words, 'van' and 'yavanika' - while the second is 'vanya' and 'vanika'. One can easily tell which is which. Not sure why they have shown the first one as a single word - they should really be two distinct words. (Of course, one can occasionally have the odd confusion between the two when spoken fast, somewhat similar to the classic: 'brAhmanrkaL SAppidumidam' being misinterpreted as 'brAhmanar kaL SAppidumidam'
) Also, the former uses the chillaksharam, not the bindu. By the way, the document seems to suggest that the unicode representation struggles to distinguish between these two - is that right?
To go back to your question about 'ambu'. What I meant to say was, this is written as: a + np + bindu - in old Malayalam! We used to have a joint letter 'np' before (half 'n' and half 'p' joined together). I am not completely sure how it is represented in the new style. On the other hand, if we peek at the other related 'rendering kriti' thread (i call it the 'vgovndan' topic - and this one the 'arunk' topic!
) he had asked me to select between 2 options, and I had chosen the first one - I think that may have included the 'np' version.
You sure are a brave soul to have embarked on this nearly impossible task!
In the 'van_yavanika' vs 'vanyavanika' example: the former is really the two words, 'van' and 'yavanika' - while the second is 'vanya' and 'vanika'. One can easily tell which is which. Not sure why they have shown the first one as a single word - they should really be two distinct words. (Of course, one can occasionally have the odd confusion between the two when spoken fast, somewhat similar to the classic: 'brAhmanrkaL SAppidumidam' being misinterpreted as 'brAhmanar kaL SAppidumidam'

To go back to your question about 'ambu'. What I meant to say was, this is written as: a + np + bindu - in old Malayalam! We used to have a joint letter 'np' before (half 'n' and half 'p' joined together). I am not completely sure how it is represented in the new style. On the other hand, if we peek at the other related 'rendering kriti' thread (i call it the 'vgovndan' topic - and this one the 'arunk' topic!

You sure are a brave soul to have embarked on this nearly impossible task!
Last edited by jayaram on 02 Dec 2006, 14:30, edited 1 time in total.
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
hmm.. i think it (unicode representation) runs to trouble because the 2 words are delivered as as combined word (like vENNila in tamizh is delivered like a single word but is really a combination of to words). It has a way to distinguish them only that it uses some special codes which were established as "ignorable" and here they absolutely cannot as the meaning changes.
BTW, i have no clue which is new and which is old
and i really dont control much! The font rendering system does some things on its own (based on some specifications in the font itself).
Arun
BTW, i have no clue which is new and which is old

Arun
-
- Posts: 1317
- Joined: 30 Jun 2006, 03:08
Ah, just as I thought. I just checked on the 'vgvindan' thread:
जिञ्चु जिङ्कु जिण्टु जिन्तु जिम्पु
जिंचु जिंकु जिंटु जिंतु जिंपु
ജിഞ്ചു ജിങ്കു ജിണ്ടു ജിന്തു ജിമ്പു
ജിംചു ജിംകു ജിംടു ജിംതു ജിംപു
The last word above, i.e. 'jimpu', is written in Malayalam as j + i + np + u (1st line) - not as j + i + o + p + u (2nd line)
This is just as I have learnt it. Thank god they haven't dropped the joint letter 'np'!
जिञ्चु जिङ्कु जिण्टु जिन्तु जिम्पु
जिंचु जिंकु जिंटु जिंतु जिंपु
ജിഞ്ചു ജിങ്കു ജിണ്ടു ജിന്തു ജിമ്പു
ജിംചു ജിംകു ജിംടു ജിംതു ജിംപു
The last word above, i.e. 'jimpu', is written in Malayalam as j + i + np + u (1st line) - not as j + i + o + p + u (2nd line)
This is just as I have learnt it. Thank god they haven't dropped the joint letter 'np'!
-
- Posts: 4066
- Joined: 26 Mar 2005, 17:01
Arun
Here is another pic showing in detail the derivations of Indian scripts.

Link
http://www.engr.mun.ca/~adluri/telugu/l ... ipt1a.html
Here is another pic showing in detail the derivations of Indian scripts.

Link
http://www.engr.mun.ca/~adluri/telugu/l ... ipt1a.html
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
jayaram,
i guess we need to establish consonant combinations in malayalam which (a) bindu is ALWAYS used (b) bindu is NEVER used (c) others. Then logic can incorporate (a) and (b) such that you do not have to explicitly specify bindu. For combinations that fall under (c), we leave it for user to explicitly specify as needed.
We have established such combinations in (a), (b), (c) above for telugu, kannada and sanskrit as well.
This can still lead to trouble if the distribution of the combinations among (a), (b), and (c) for malayalam is quite different from kannada/telugu/sanskrit, we will have some problems.
For example, consider a combination XY (X some consonant, Y another). In kannada/telugu/sanskrit, let us say XY falls under (a) and so X always morphs to bindu. But XY in malayalam falls under (c) and does not always require bindu and depends on the word in which "XY" figures. If we are trying to render a kannada krithi in malayalam, then you will have some anamolies. So you may have end up having bindu when it doesnt apply, or not have bindu when it should have applied etc. Same problems can arise when rendering a malayalam krithi in kannada/telugu etc.
Arun
i guess we need to establish consonant combinations in malayalam which (a) bindu is ALWAYS used (b) bindu is NEVER used (c) others. Then logic can incorporate (a) and (b) such that you do not have to explicitly specify bindu. For combinations that fall under (c), we leave it for user to explicitly specify as needed.
We have established such combinations in (a), (b), (c) above for telugu, kannada and sanskrit as well.
This can still lead to trouble if the distribution of the combinations among (a), (b), and (c) for malayalam is quite different from kannada/telugu/sanskrit, we will have some problems.
For example, consider a combination XY (X some consonant, Y another). In kannada/telugu/sanskrit, let us say XY falls under (a) and so X always morphs to bindu. But XY in malayalam falls under (c) and does not always require bindu and depends on the word in which "XY" figures. If we are trying to render a kannada krithi in malayalam, then you will have some anamolies. So you may have end up having bindu when it doesnt apply, or not have bindu when it should have applied etc. Same problems can arise when rendering a malayalam krithi in kannada/telugu etc.
Arun
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
drs, others,
I have some question about tamizh rendering for certain consonant combinations which do not exist in tamizh but which are common in telugu (and other lang) krithis.
Consider the word enta, anta etc. Very common. But the "nt" sound is not part of tamizh - if "na", then would it be "da" that follows (and hence ந் + த). The question then is for krithis that have enta, anta etc. what is the best way of showing it in tamizh.
I was looking at some CM books, and as perhaps expected they all show it differently. The Adi tyAgarAja hrdayam book by K.V. Srinivasa Iyengar, simply introduces bindu and renders it as எம்த. The TSP book uses எந்த (similarly Smt. Vidya Sankar's book on SS krithis uses ங்க for nka combinations). Both books uses qualifiers for கரி places to indicate ta vs da but here they dont. The scheme is if no qualifiers it is "ta", qualifiers are used for the other variants (i.e. tha, dha etc.). However I find both can easily mislead readers:
1. The KVS use of ம் would be lost of most tamizh readers unless they know sanskrit - they will tend to read in as "ma". Btw, this may be been the underlying reason why i initially thought anuswara changed pronounciation. Besides this, although i cant remember well, i think i have run across one or two people (non-tamilians) using the "ma" sound for anuswara in places where it really takes the "na" sound. Is this an uncommon misunderstanding?
2. The TSP book's use of ந் can also mislead unless reader is extra careful as this in itself implies "nd" sound. This is compounded by #3 below.
3. The absence of any qualifiers here for த, misleads more IMO as it would more naturally need tamizh readers to apply tamizh pronounciation rules. Since no preceding த், they will attach the "da" sound to த (unless they already know the song or the telugu word
. This is worse for #2 above becase the "nd" already implies the "da" sound.
While I do not know the best solution, here are all the possibilities:
1. என்த
2. என்த' (where i have arbitrarily used ' as a qualfier - it could be anything, say a small raised "1" )
3. எம்த
4. எம்த'
5. எந்த
6. எந்த'
My order of preference is
2, 6, 1, 5
Arun
I have some question about tamizh rendering for certain consonant combinations which do not exist in tamizh but which are common in telugu (and other lang) krithis.
Consider the word enta, anta etc. Very common. But the "nt" sound is not part of tamizh - if "na", then would it be "da" that follows (and hence ந் + த). The question then is for krithis that have enta, anta etc. what is the best way of showing it in tamizh.
I was looking at some CM books, and as perhaps expected they all show it differently. The Adi tyAgarAja hrdayam book by K.V. Srinivasa Iyengar, simply introduces bindu and renders it as எம்த. The TSP book uses எந்த (similarly Smt. Vidya Sankar's book on SS krithis uses ங்க for nka combinations). Both books uses qualifiers for கரி places to indicate ta vs da but here they dont. The scheme is if no qualifiers it is "ta", qualifiers are used for the other variants (i.e. tha, dha etc.). However I find both can easily mislead readers:
1. The KVS use of ம் would be lost of most tamizh readers unless they know sanskrit - they will tend to read in as "ma". Btw, this may be been the underlying reason why i initially thought anuswara changed pronounciation. Besides this, although i cant remember well, i think i have run across one or two people (non-tamilians) using the "ma" sound for anuswara in places where it really takes the "na" sound. Is this an uncommon misunderstanding?
2. The TSP book's use of ந் can also mislead unless reader is extra careful as this in itself implies "nd" sound. This is compounded by #3 below.
3. The absence of any qualifiers here for த, misleads more IMO as it would more naturally need tamizh readers to apply tamizh pronounciation rules. Since no preceding த், they will attach the "da" sound to த (unless they already know the song or the telugu word

While I do not know the best solution, here are all the possibilities:
1. என்த
2. என்த' (where i have arbitrarily used ' as a qualfier - it could be anything, say a small raised "1" )
3. எம்த
4. எம்த'
5. எந்த
6. எந்த'
My order of preference is
2, 6, 1, 5
Arun
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
Jayaram,
Yes you could - the only reason to qualify is nta is not natural to tamizh and without qualifiers, people may take the த to mean "da".
Actually while this looks nice, pankaja as பன்கஜ - can introduce extraneous artifacts as many tamilians may look at it and tend to emphasize the ன் too much
. For some reason here பங்க'ஜ seems to fit better - but not sure though. This is obviously cases where you cannot convey it accurately right away and people just have to be more aware.
Arun
Yes you could - the only reason to qualify is nta is not natural to tamizh and without qualifiers, people may take the த to mean "da".
Actually while this looks nice, pankaja as பன்கஜ - can introduce extraneous artifacts as many tamilians may look at it and tend to emphasize the ன் too much

Arun
Last edited by arunk on 02 Dec 2006, 23:00, edited 1 time in total.
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
There is no nta sound - so Santi as is doesnt exist in tamizh. It morphs to sAndi (loosing S also) and the sound is identical as in mukunda. But many tamilians (perhaps those with more familiarity to Sanskrit and the word's origins) know and would say it as SAnti (they may have trouble between Sa and Sha though
)
Arun

Arun
Last edited by arunk on 03 Dec 2006, 00:35, edited 1 time in total.
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
jayaram,
from what i could find the bindu is used in malayalam when "m" sound occurs without any vowel extenson i.e. dead consonant (as in sampath, and not as in rAmA). It seems considered like a cillakshara of ma.
I havent seen where it is used at the end of words that end with "u" (like what you said about anpu), where it in malayalam is pronounced as a "half-vowel" (i.e. barely perceptible u and not quite regular u - what i could make from reading). Atleast more than one reference on web i could find says the candrakala is employed there.
So looks like malayalam does not use the bindu like kannada. It is used only when "m" sound occurs at the end or in the middle (i.e. m is followed by consonant: rambha, sampath etc.?).
There seem other issues like nR becoming nd if n is not rendered as cillakshara etc.
Arun
from what i could find the bindu is used in malayalam when "m" sound occurs without any vowel extenson i.e. dead consonant (as in sampath, and not as in rAmA). It seems considered like a cillakshara of ma.
I havent seen where it is used at the end of words that end with "u" (like what you said about anpu), where it in malayalam is pronounced as a "half-vowel" (i.e. barely perceptible u and not quite regular u - what i could make from reading). Atleast more than one reference on web i could find says the candrakala is employed there.
So looks like malayalam does not use the bindu like kannada. It is used only when "m" sound occurs at the end or in the middle (i.e. m is followed by consonant: rambha, sampath etc.?).
There seem other issues like nR becoming nd if n is not rendered as cillakshara etc.
Arun
-
- Posts: 1317
- Joined: 30 Jun 2006, 03:08
Then why worry about the distinction between endaro and entara? Your project is directed towards CM-savvy people any way who have some knowledge of Sanskrit (as you say), so perhaps you don't need to invent something new?There is no nta sound - so Santi as is doesnt exist in tamizh. It morphs to sAndi (loosing S also) and the sound is identical as in mukunda. But many tamilians (perhaps those with more familiarity to Sanskrit and the word's origins) know and would say it as SAnti (they may have trouble between Sa and Sha though)
Last edited by jayaram on 03 Dec 2006, 00:49, edited 1 time in total.
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
People may know sanskrit words - they sure wont know telugu words.
Besides even people who know the sanskrit words will not know all the subtleties. It is safe to say most will get it wrong than right.
Arun
Besides even people who know the sanskrit words will not know all the subtleties. It is safe to say most will get it wrong than right.
Arun
Last edited by arunk on 03 Dec 2006, 00:48, edited 1 time in total.
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
besides the trouble is the same words (or words that sound similar and mean the same thing) will occur in tamizh krithis as well.
When singing "pAl vaDiyum mugam", whether you are tamil/telugu/kannada/malayalam you should say mugam. But when singing "gajamukha" in a telugu/sanskrit krithi, all of us should say mukha. If both were written the same way, whats the use? This I think could be one of the reasons why so many tamilians get pronounciation wrong - we all involuntarily default to our native language pronounciation rules. I have heard some telugu people pronounce "sara" as sort of like "sera". While that may be acceptable in a telugu krithi, that would be wrong in a sanskrit krithi.
Arun
When singing "pAl vaDiyum mugam", whether you are tamil/telugu/kannada/malayalam you should say mugam. But when singing "gajamukha" in a telugu/sanskrit krithi, all of us should say mukha. If both were written the same way, whats the use? This I think could be one of the reasons why so many tamilians get pronounciation wrong - we all involuntarily default to our native language pronounciation rules. I have heard some telugu people pronounce "sara" as sort of like "sera". While that may be acceptable in a telugu krithi, that would be wrong in a sanskrit krithi.
Arun
Last edited by arunk on 03 Dec 2006, 00:56, edited 1 time in total.
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
Yes if we are writing a non-tamizh krithi in tamizh. This of course is nothing new. Most CM books in tamizh do it - but they are not all consisten, and maybe not always unambiguous.jayaram wrote:what that means is that your scheme will come up with distinct representations in Tamil for the nta/nda sounds in SAnti and mukunda - yes?
Arun
-
- Posts: 1317
- Joined: 30 Jun 2006, 03:08
The bindu appears mostly at the end of words, e.g. vastRam (meaning dress). It also appears occasionally within a word, e.g. rambhafrom what i could find the bindu is used in malayalam when "m" sound occurs without any vowel extenson i.e. dead consonant (as in sampath, and not as in rAmA). It seems considered like a cillakshara of ma.
Not completely sure what you are saying here. The chandrakala appears at the end of words like 'anpu' (pronounced as 'ambu', meaning arrow), but the word can morph into 'anpum' as in 'anpum villum' meaning 'bow and arrow'. The 'u' sound here changes to the regular 'u' sound (as in 'put'). Hope that's clear!I havent seen where it is used at the end of words that end with "u" (like what you said about anpu), where it in malayalam is pronounced as a "half-vowel" (i.e. barely perceptible u and not quite regular u - what i could make from reading). Atleast more than one reference on web i could find says the candrakala is employed there.
Yes, but with the caveats I mention above!So looks like malayalam does not use the bindu like kannada. It is used only when "m" sound occurs at the end or in the middle (i.e. m is followed by consonant: rambha, sampath etc.?).
(btw, all this discussion has helped sharpen my mind, so thank you!)
-
- Posts: 1317
- Joined: 30 Jun 2006, 03:08
Perhaps we shouldn't get so anal about all this, as DRS says. With all this complicated scheme, the native speaker may end up pronouncing based on their native style any way. A bit like DKJ saying 'muruha' for 'muruga' etc., and we have Smt Swaminathan's first name being written (and pronounced as well) as 'karpakam'/'karpagam'/'kalpakam'/'kalpagam'/'karpaham'... any way!! 

Last edited by jayaram on 03 Dec 2006, 01:13, edited 1 time in total.
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
there are no caveats yet. In both vastham and rambha, it occurs where the "m" sound happens. I want to know if these are specific cases of a more generic rule: why is why I want to know does it occur whever the "m" sound occurs? I mean not ma sound, but just m as in amba, sampat, amsa etc.?
How about ramya?
Regarding anpu you mentioned before:
Arun
How about ramya?
Regarding anpu you mentioned before:
Did you mean chandrakala here or bindu?ambu (meaning arrow) is written as a+n+p+<bindu>
Arun
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
That would be defeatist attitudejayaram wrote:Perhaps we shouldn't get so anal about all this, as DRS says. With all this complicated scheme, the native speaker may end up pronouncing based on their native style any way. A bit like DKJ saying 'muruha' for 'muruga' etc., and we have Smt Swaminathan's first name being written (and pronounced as well) as 'karpakam'/'karpagam'/'kalpakam'/'kalpagam'/'karpaham'... any way!!

Arun
Last edited by arunk on 03 Dec 2006, 01:21, edited 1 time in total.
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
besides the scheme at this point is not that complicated (the nuts and bolts of it may be - but that wont be exposed to users). If any it is simpler as it doesnt require script specific artifacts. I still think it is honest to the individual phonemes in the word. For example, the word for wind in tamizh is pronounced "kATru" but written "kARRu". The "RR" combination has the "Tr" sound in tamizh. It need not be so for all languags. But the scheme should allow you to enter as kATru based on the sound of the word, not how it is written in any particular language.
Arun
Arun
-
- Posts: 1317
- Joined: 30 Jun 2006, 03:08
As I said before:there are no caveats yet. In both vastham and rambha, it occurs where the "m" sound happens. I want to know if these are specific cases of a more generic rule: why is why I want to know does it occur whever the "m" sound occurs? I mean not ma sound, but just m as in amba, sampat, amsa etc.?
amba >> uses the bindu
sampathu >> uses the 'np' letter
amsa >> uses the bindu
this is written as: ra + ma + the 'ya' suffix (can't remember the unicode, but hope you understand!)How about ramya?
I should have said: a+npa+<chandrakala>Regarding anpu you mentioned before:Did you mean chandrakala here or bindu?ambu (meaning arrow) is written as a+n+p+<bindu>
Here 'npa' is one letter, the one used in the last word in the vgvndan list mentioned earlier.
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
jayaram,
Maybe you mistook my question again. I was not asking about malayalam sampathu - but sampat as say from a sanskrit krithi. How about ramya again from a sanskrit krithi.
Also, can you think of more examples where "m" sound occurs in the middle of a malayalam word where bindu is NOT used.
Arun
Maybe you mistook my question again. I was not asking about malayalam sampathu - but sampat as say from a sanskrit krithi. How about ramya again from a sanskrit krithi.
Also, can you think of more examples where "m" sound occurs in the middle of a malayalam word where bindu is NOT used.
Arun
Last edited by arunk on 03 Dec 2006, 01:29, edited 1 time in total.
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
ok. So "mp" in malayalam is actually written as "n" + "p". Can you think of other "m" based sounds written without ma? Also other examples where "m" sound occurs, and no bindu figures
ramya: That could be fine too.
I ask this so that the logic can automatically figure out when to use bindu and when not (again not just for malayalam words, but non-malayalam ones too). If we cannot, malayalam rendering could become a unsolvable problem.
Arun
ramya: That could be fine too.
I ask this so that the logic can automatically figure out when to use bindu and when not (again not just for malayalam words, but non-malayalam ones too). If we cannot, malayalam rendering could become a unsolvable problem.
Arun
Last edited by arunk on 03 Dec 2006, 01:38, edited 1 time in total.
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
Jayaram,
I am sorry to bug you again. But you said ambu (pronounced with a ba sound right?) is written as a + npa + chandrakala. But sampat (pronounced with a pa sound) is written as sa + npa + t (??). How do malayalam readers distinguish that first "np" carries the "nbu" sound, and second "np" carries the "npa" sound?
Arun
I am sorry to bug you again. But you said ambu (pronounced with a ba sound right?) is written as a + npa + chandrakala. But sampat (pronounced with a pa sound) is written as sa + npa + t (??). How do malayalam readers distinguish that first "np" carries the "nbu" sound, and second "np" carries the "npa" sound?
Arun
-
- Posts: 1317
- Joined: 30 Jun 2006, 03:08
quick answer: only when it is used with the 'pa' sound, must be because we have a combined letter for this, 'npa'. Also, the word 'amma' (mother) is written as: a+mma - where 'mma' is again a single combined letter. Having said that, this one may have been changed to a+ma+chandrakala+ma in the modern script. I will need to check it.Also, can you think of more examples where "m" sound occurs in the middle of a malayalam word where bindu is NOT used.
Thinking thru ya, ra, la, va, Sa, Sha, sa, ha: Sa and sa use the bindu. And mya uses ma+<ya suffix> as mentioned earlier. Are the rest relevant?
-
- Posts: 1317
- Joined: 30 Jun 2006, 03:08
I must admit I was being a bit lax in my writing here! It should strictly be written as 'ampu' but we tend to pronounce it more as 'ambu'. So now that should tally with sampat, yes?But you said ambu (pronounced with a ba sound right?) is written as a + npa + chandrakala. But sampat (pronounced with a pa sound) is written as sa + npa + t (??). How do malayalam readers distinguish that first "np" carries the "nbu" sound, and second "np" carries the "npa" sound?
Sorry for confusing you, but this is what happens with being a native speaker who hasn't thought thru these things that carefully!
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
I was sort of guessing this. But note that if you are talking about writing a sanskrit krithi in malayalam, we have a problem. It cannot be pronounced "mb", it should be pronounced "mp". Should that be written with "np" (and qualified to indicate it is mp) or "m" + "pa" (if that even makes sense).jayaram wrote:It should strictly be written as 'ampu' but we tend to pronounce it more as 'ambu'. So now that should tally with sampat, yes?
Also if we are talking about a malayalm krithi, in the transl. text input, i would rather we enter ambu (i.e. how it is pronounced and not how it is spelt in the Indic language). This will help it being rendered correctly in all languages. The scheme should refer to phonemes. Agreed?
On a side note, it seems perplexing to me as an outsider, as to why ambu is not written as it sounds (i.e. sort of like a + m + bu) as malayalam alphabet does distringuish pa with ba

The cases of bindu with y, r, l etc. seems consistent with kannada/telugu. The other consonants are important too - so if you dont mind please think of them as well.
Arun
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
i did find a reference which mentions 3 forms including "npa" where there is a "misfit between visual sign and pronounciation". The others seem "ngka" (i.e. ng + ka), and "ksha" (i.e. k + sha). How are they pronounced?
The reference also later mentions two more:
1. RR (two hard ra) is pronounced as Ta. I am not sure if i am interpreting this right? Can you give an example. This seems sort of similar to tamizh where RR => Tr (or TR).
2. Also mentions "n" + "Ra" becomes "nda" unless "n" is cillaksara in which case it is/remains "nRa" (gives the example of henry).
Pl. confirm these. I hope these are comprehensive.
So far I think we are ok. How do render "mp"/ksha etc. in the input as say from non-malayalam krithis (and hence phonetically must remain the same) is an open question but the others i think can be handled.
Thanks
Arun
The reference also later mentions two more:
1. RR (two hard ra) is pronounced as Ta. I am not sure if i am interpreting this right? Can you give an example. This seems sort of similar to tamizh where RR => Tr (or TR).
2. Also mentions "n" + "Ra" becomes "nda" unless "n" is cillaksara in which case it is/remains "nRa" (gives the example of henry).
Pl. confirm these. I hope these are comprehensive.
So far I think we are ok. How do render "mp"/ksha etc. in the input as say from non-malayalam krithis (and hence phonetically must remain the same) is an open question but the others i think can be handled.
Thanks
Arun
Last edited by arunk on 03 Dec 2006, 02:27, edited 1 time in total.