Single transliteration scheme for all CM languages?

arunk · Post by **arunk** » 28 Nov 2006, 03:39

drshrikaanth wrote:
arunk wrote:I am not sure this could be a result of flawed logic for detecting the "ru" vowel (hrdaya etc.)? When can that vowel occur? (i.e the one in krpa, hrdaya, mrdangam etc.)
The vowel is part of the kAguNita series of each consonant. It occurs in combination with any and every consonant. kRpa, gRdhra, ghRNa, tRpti, dhRti and so on.

Hmm. I was atleast going for that - but i was being lax on "r" vs "R" andthat is the problem? As in kra vs kRa is different right (i know i am slow - but as i mentioned i really dont know these scripts

).

No. gnya wont do. For e.g agnystra. The consonant will have to have a unique representation.

I agreed. If we have to go numeric which would it be? How about gny~Ana (or gny2Ana)? As long as y2 or y~ doesnt mean anything else, we would be ok. The other possibility is n~ or ny~. I know numeric is easier to remember but in this case for some reason (maybe my preference) the "~" is less intrusive

, and so gny~Ana/ny~Ana is almost gnyAna/nyAna which are closest phonetically (?)

drshrikaanth wrote:
This could be trouble . So something like sangam would be saMgam but vAngmaya would not be? The rules for anuswara are not deterministic or am i mistaken?
No you are getting it wrong. The pronunciation is different.

I guess i dont know how vAngamaya is pronounced (since you said saMgam is same as sangam and the anuswara doesnt change pronounciation)? Or atleast tell me how it is transliterared in baraha and I can perhaps figure it out.

drshrikaanth wrote:You just have to know the spellings thats all. As long as you write the script right for your software, the headche of getting the spelling right lies with the user.

Agreed but remember that the challenge is not to require too many language specific issues. We might as well stick to 5 separate schemes

. Now I am not saying we are getting there but the less it is required from the user and the more the logic can figure things contextually, the easier it is. Atleast that is my secret hope

. But the logic can "carry things" only in places where rules are deterministic.

drshrikaanth wrote:
Yes but we have to be sure that two languages dont have different consonants (so n2 means one thing in tamizh, but say something else in malayalam). But then if we can enumerate all cases, we can arrive at a numbering scheme that avoids collisions.
I dont think there will be a problem between tamizh and malayALam. Both these languages have 2 nakAras and their pattern of occurrences also should be the same.
If someone knows otherwise, please quote examples.

That was just an arbitrary hypothetical example. What I actually meant was that the numberings (for any consonants - not just for na) must be specific to languages to avoid clashes (so n2 always means that other na in tamizh, and nothing else for all languages). Otherwise they would be ambigous. But if we have only a few cases and they are well known we can easily apply the scheme. As we test with all languages, we will soon find out how many of these trouble spots are there.

Arun

arunk · Post by **arunk** » 28 Nov 2006, 03:47

ramakriya wrote:BTW, I saw that krSNa transliterates correctly as ಕೃಷ್ಣ in KannaDa but incorrectly as க்ர்ஸ்ண in tamizh.

Should it have been kRShNa (i.e. is it kR or kr, and also is it Sh or S)? As regard to tamizh, this was the "controversial" issue i alluded. In a tamizh krithi it would be கிருஷ்ணா. It is not pronounced exactly the same way as ಕೃಷ್ಣ - right? The 2 languages have different pronounciations for this word and they are both correct within their contexts. But if the word appears in a kannada krithi and the transl. text is kRShNa, then should tamizh still render it as கிருஷ்ணா giving way for incorrect pronounciation for that krithi? I say no

but i am not that sure. The same applies to a tamizh krithi.

Maybe I am still missing the point though.

Note however that if I get the "detection" of that vowel right, the tamizh logic can introduce "ru" (i dont have that logic in place yet) and so it is possible to make it come out as கிருஷ்ணா - the question is would that be right for our purposes where sahitya suddham is important and ideally i would like it maintained as much as possible?

Arun

arunk · Post by **arunk** » 28 Nov 2006, 04:03

ramakriya wrote:There are few words which begin with ಋ - It can also occur in a cluster preceded and followed by almost any consonent.

Since it can begin words (can you pl. give an example with transl. text and meaning pl) I think we have to assign a unique pattern for it (otherwise we simply look for r/R in the middle of 2 consonants).

Baraha uses "Ru". The trouble is kRupa would become க்றுப in tamizh. We could use R2 or maybe R~ (so that kR~pa looks almost like kRpa

), or should we go with r2/r~? Note that in any case the tamizh interpretor needs to treat like krupa (and not kRupa, and definitey not kirubai

)

I presume based on your post that we can ignore ೠ?

Arun

drshrikaanth · Post by **drshrikaanth** » 28 Nov 2006, 04:03

arunk wrote:Hmm. I was atleast going for that - but i was being lax on "r" vs "R" andthat is the problem? As in kra vs kRa is different right (i know i am slow - but as i mentioned i really dont know these scripts ).

I lost you here. We are talking about the vowel right? Why did you bring in kra and kRa here? In any case, they are different although offhand I cannot recall R(vallinam) occurring in combination with any consonant except itself as far as kannaDa is concerned. And in kannaDa there are words which originlly bega with R (In case this question crops up later).

I agreed. If we have to go numeric which would it be? How about gny~Ana (or gny2Ana)? As long as y2 or y~ doesnt mean anything else, we would be ok. The other possibility is n~ or ny~. I know numeric is easier to remember but in this case for some reason (maybe my preference) the "~" is less intrusive , and so gny~Ana/ny~Ana is almost gnyAna/nyAna which are closest phonetically (?)

Did I advocate the use of numericals in this case? I was talking of n and n2 and it was in that context as they are simply variations of one sound. I do agree that in this case "~" is a better choice. (It is so used in spanish also IIRC)As in most schemes. I think we should go for "~n" for the nasal consonant of the second pentad. As for the nasal from the first pentad why not go for "#n".

I guess i dont know how vAngamaya is pronounced (since you said saMgam is same as sangam and the anuswara doesnt change pronounciation)? Or atleast tell me how it is transliterared in baraha and I can perhaps figure it out.

Hold on. Now you are confusing issues (or Iam losing it). I thought I explained clearly that when this nasal consonant occurs as the first part of a conjunct with a consonant from a different pentad, it is not represented by the bindu/anuswAra. Actually if you write vAMgmaya, you are introducing a "g" sound which is not there in the word pronunciation. If you simply write vAMmaya to get rid of "g", the pronunciation collapses. So you will need a separate symbol here and not the bindu.

drshrikaanth wrote:You just have to know the spellings thats all. As long as you write the script right for your software, the headche of getting the spelling right lies with the user.
Agreed but remember that the challenge is not to require too many language specific issues. We might as well stick to 5 separate schemes . Now I am not saying we are getting there but the less it is required from the user and the more the logic can figure things contextually, the easier it is. Atleast that is my secret hope . But the logic can "carry things" only in places where rules are deterministic.

You missed my point. I said, those who read or write in say kannaDa or tamizh will have to and will know how to spell it in that language. You dint have to worry about it. All you need to do is provide symbols so that there is no ambiguity but not making things too complex. But as you are adding many variables(languages) in the equation, you are bound to encounter increasing complexity. If you think it is getting too cpmlicated, then pehaps you should best leave it as separate schemes.

That was just an arbitrary hypothetical example. What I actually meant was that the numberings (for any consonants - not just for na) must be specific to languages to avoid clashes (so n2 always means that other na in tamizh, and nothing else for all languages). Otherwise they would be ambigous. But if we have only a few cases and they are well known we can easily apply the scheme. As we test with all languages, we will soon find out how many of these trouble spots are there.

Arun

I dont think n and n2 will cause any confusion with the other languages. Lets worry about others when we come to them.

drshrikaanth · Post by **drshrikaanth** » 28 Nov 2006, 04:07

arunk wrote:Since it can begin words (can you pl. give an example with transl. text and meaning pl) I think we have to assign a unique pattern for it (otherwise we simply look for r/R in the middle of 2 consonants).

Rtu, RNa, RShi to give a few.

Baraha uses "Ru". The trouble is kRupa would become க்றுப in tamizh. We could use R2 or maybe R~ (so that kR~pa looks almost like kRpa ), or should we go with r2/r~? Note that in any case the tamizh interpretor needs to treat like krupa (and not kRupa, and definitey not kirubai )

Arun "R." is the simplest. Would that not work here?

I presume based on your post that we can ignore ೠ?

Yes. Definitely.

ramakriya · Post by **ramakriya** » 28 Nov 2006, 04:08

arunk wrote:
ramakriya wrote:BTW, I saw that krSNa transliterates correctly as ಕೃಷ್ಣ in KannaDa but incorrectly as க்ர்ஸ்ண in tamizh.
Should it have been kRShNa (i.e. is it kR or kr, and also is it Sh or S)? As regard to tamizh, this was the "controversial" issue i alluded. In a tamizh krithi it would be கிருஷ்ணா. It is not pronounced exactly the same way as ಕೃಷ್ಣ - right? The 2 languages have different pronounciations for this word and they are both correct within their contexts. But if the word appears in a kannada krithi and the transl. text is kRShNa, then should tamizh still render it as கிருஷ்ணா giving way for incorrect pronounciation for that krithi? I say no but i am not that sure. The same applies to a tamizh krithi.

Maybe I am still missing the point though.

Arun

My point was it should have used ஷ instead of ஸ். I am aware of the problem posed by the vowel R in Tamil, and I think you should folow whatever scheme you think would be useful for tamizh readers.

-Ramakriya

arunk · Post by **arunk** » 28 Nov 2006, 04:12

I used "s" for sa as in samam, "S" for Syama, and Sh for dIkshitar. So shouldnt krSNa be krShNa?

Arun

drshrikaanth · Post by **drshrikaanth** » 28 Nov 2006, 04:13

arunk wrote:I used "s" for sa as in samam, "S" for Syama, and Sh for dIkshitar. So shouldnt krSNa be krShNa?

Arun

Yes. kRShNa.

arunk · Post by **arunk** » 28 Nov 2006, 04:14

ramakriya wrote:I think you should folow whatever scheme you think would be useful for tamizh readers.

Perhaps - as long as we perhaps qualify the "ru" with a super-script of 1 to indicate its not like the tamizh ru. Books use such superscripts for dha, bha etc. - the trouble some books I think freely convert some words into familiar equiv. tamizh words which can carry different pronounciation. I thought we should avoid it.

Arun

arunk · Post by **arunk** » 28 Nov 2006, 04:19

drshrikaanth wrote:Arun "R." is the simplest. Would that not work here?

Ah yes. I missed that. Yes simpler and less intrusive than ~ (btw us programmers are very familiar with ~ and so i am subliminally biased here

)

I still am a bit confused between r and R here. If this "ru" is going to carry a ".", then does it necessarily have to be R. or can it be r.. I dont think it matters except thamizh would need to convert it to "ru" (i.e. "softer" r).

Also I can make it easy by not requiring the "." when appearing in-between two consonants. Should I do so? In other words, you dont have to explicily specify it unless you need to.

Arun

drshrikaanth · Post by **drshrikaanth** » 28 Nov 2006, 04:21

Arun
Ignore my crib about kRpe not showing up correct. i think it was the firefox that was making it appear misshapen. One has to write "krpe". Please change it to kRpe in the transliteration scheme.

drshrikaanth · Post by **drshrikaanth** » 28 Nov 2006, 04:24

arunk wrote:Also I can make it easy by not requiring the "." when appearing in-between two consonants. Should I do so? In other words, you dont have to explicily specify it unless you need to.

Now Now! You are not paying attention to my posts are you

r can still appear as a consonant between 2 consonants . tarpaNa, tarjume etc. Which is why just use "R" for the vowel whenever it appears between 2 consonants.

arunk · Post by **arunk** » 28 Nov 2006, 04:31

drs,

(i am confused due to lack of knowledge of other scripts and in the process i am confusing others - sorry!)

Just to be sure: Are you proposing

n for ன (as in manam)
#n for ந (as in #nAmam)
~n for ஞ (as in ~nAnam)

I like #n, but i think ~n doesnt quite convey it phonetically. How about atleast ~nyAnam (or may ~ny <=> ~n so you can have it both ways).

Arun

arunk · Post by **arunk** » 28 Nov 2006, 04:33

drshrikaanth wrote:
arunk wrote:Also I can make it easy by not requiring the "." when appearing in-between two consonants. Should I do so? In other words, you dont have to explicily specify it unless you need to.
Now Now! You are not paying attention to my posts are you r can still appear as a consonant between 2 consonants . tarpaNa, tarjume etc. Which is why just use "R" for the vowel whenever it appears between 2 consonants.

Sorry I was not being precise. When I mean in-between consonants, the preceding consonant doesnt have a vowel extension (unvoiced? not dead? dont remember the linguistic term). I mean krp where k appears without a vowel after it as in karp etc.

Arun

drshrikaanth · Post by **drshrikaanth** » 28 Nov 2006, 04:34

arunk wrote:drs,

(i am confused due to lack of knowledge of other scripts and in the process i am confusing others - sorry!)

Just to be sure: Are you proposing

n for ன (as in manam)
#n for ந (as in #nAmam)
~n for ஞ (as in ~nAnam)

I like #n, but i think ~n doesnt quite convey it phonetically. How about atleast ~nyAnam (or may ~ny <=> ~n so you can have it both ways).

Arun

No no.

n for ன்

n2 for ந்

~n for ஞ

#n for ங்

arunk · Post by **arunk** » 28 Nov 2006, 04:35

btw, i want to thank you all for taking the time and trouble to try this out and make valuable suggestions!!

Arun

drshrikaanth · Post by **drshrikaanth** » 28 Nov 2006, 04:37

arunk wrote:Sorry I was not being precise. When I mean in-between consonants, the preceding consonant doesnt have a vowel extension (unvoiced? not dead? dont remember the linguistic term). I mean krp where k appears without a vowel after it as in karp etc.

Sorry Arun. I missed the import. But no, it wtill wont work You have words like svAtantrya where r is the consonant and not the vowel. Hence the choice of R.

drshrikaanth · Post by **drshrikaanth** » 28 Nov 2006, 04:39

arunk wrote:btw, i want to thank you all for taking the time and trouble to try this out and make valuable suggestions!!

You are welcome Arun. Closing shop for the day. Goodnight.

arunk · Post by **arunk** » 28 Nov 2006, 04:39

drshrikaanth wrote:No no.

n for ன்

n2 for ந்

~n for ஞ

#n for ங்

OK. But why n2 and ~n (i.e. qualifier prefix in one-case and suffix in another)? I think we should go for consistency. Maybe if we come up with $ or @ or ^ instead of 2 we can use it also as prefix?

BTW, atleast for tamizh the #n can usually be figured out based on context (tangam) but not always (a#n#naNam? - but we could go wth angnganam - bogus though!). Do you think there are problems with this assumption? How about other languages? I am sure malayalam if any could pose problems.

I am hoping if it is ok that the engine wont require you to explicitly specify this everytime - and you use it only to force it? That can reduce the number of these extra symbols?

Arun

arunk · Post by **arunk** » 28 Nov 2006, 04:41

drshrikaanth wrote:[
Sorry Arun. I missed the import. But no, it wtill wont work You have words like svAtantrya where r is the consonant and not the vowel. Hence the choice of R.

This cant be svAntantriyA or would that be different?

Arun

drshrikaanth · Post by **drshrikaanth** » 28 Nov 2006, 04:42

arunk wrote:This cant be svAntantriyA or would that be different?

It would be very different and incorrect as well.

arunk · Post by **arunk** » 28 Nov 2006, 04:43

never mind. It would be so even in tamizh

Ok. that convinces me. I will try to put the logic for "R."

Arun

vasanthakokilam · Post by **vasanthakokilam** » 28 Nov 2006, 05:39

arunk wrote:btw, i want to thank you all for taking the time and trouble to try this out and make valuable suggestions!!

Arun

More significantly, thank you for all the effort.

arunk · Post by **arunk** » 28 Nov 2006, 07:06

thanks vk.

arunk · Post by **arunk** » 28 Nov 2006, 07:11

BTW, i updated the logic with
1. support for visarga. You explicitly specify using 'H'. Now 'H' always stands for this (so bHa would be wrong, bha should be used).
2. (hope) better handling of the "krpa" etc. you specify it as kR.pa i.e. R. as proposed by drs.
3. Instead of gnya, now ~n (as ~nAnam -> knowledge). It also accepts the phonetically closer ~ny (i.e. ~ny <=> ~n). If it runs into any conflict, i can remove it.

Please let me know if they work better.

Arun

vasanthakokilam · Post by **vasanthakokilam** » 28 Nov 2006, 07:57

Arun: If the # of this type of non-phonetic representation is relatively small, you can put that in a help box to the right of the typing window as a ready reference.

arunk · Post by **arunk** » 28 Nov 2006, 18:39

drs,

regarding vAngmaya:
I think the problem is "ng" was taken like "Mg" due to other examples (like saMgam) - but obviously not all "ng" is "Mg"; in cases like this it would be the "nga" consonant. Would that be correct? In tamizh of course (like with other consonants) there is no difference.

I am planning to introduce "#n" and thus vA#nmaya (and also the phonetically closer vA#ngmaya).

Do you know when a "nga" is used vs. "Mg" is used? Could it be "nga" in vAngmaya because the "ng" is followed by "m"? Would there be a determinstic rule that can be discerned as to when a "ng" becomes "Mg" vs uses the "nga" consonant? If so, i can incorporate that to the logic and atleast in cases like vAngmaya, you wont have to explicitly specify #n.

I also worry about transl. a tamizh krithi in kannada. In tamizh the "ng" sound is always same and hence same letter. So would all occurences of "ng" in that tamizh krithi (tangam etc.) become "Mg" or "nga" when going over to kannada?

Arun

drshrikaanth · Post by **drshrikaanth** » 28 Nov 2006, 18:48

arunk wrote:Do you know when a "nga" is used vs. "Mg" is used? Could it be "nga" in vAngmaya because the "ng" is followed by "m"? Would there be a determinstic rule that can be discerned as to when a "ng" becomes "Mg" vs uses the "nga" consonant? If so, i can incorporate that to the logic and atleast in cases like vAngmaya, you wont have to explicitly specify #n.

You really are not payingattention to my posts. I have answered thiss twice already.

So would all occurences of "ng" in that tamizh krithi (tangam etc.) become "Mg" or "nga" when going over to kannada?

Arun

Yes.

jayaram · Post by **jayaram** » 28 Nov 2006, 18:57

It seems some of the peculiarities of the Malayalam language have not been considered so far.

A bit difficult to explain the sounds thru the written word - for example, in the word 'Nanni' (meaning thanks) the 'nn' is not like the 'Thanni' in Tamil - it is pronounced in a 'flat' way (imagine extending the 'n' in pandal). In fact, the first 'n' in Nanni is also pronounced this way in Malayalam.

And there are a few more like this one...

Perhaps a fellow Malayali can explain this better.

arunk · Post by **arunk** » 28 Nov 2006, 20:35

drshrikaanth wrote:
arunk wrote:Do you know when a "nga" is used vs. "Mg" is used? Could it be "nga" in vAngmaya because the "ng" is followed by "m"? Would there be a determinstic rule that can be discerned as to when a "ng" becomes "Mg" vs uses the "nga" consonant? If so, i can incorporate that to the logic and atleast in cases like vAngmaya, you wont have to explicitly specify #n.
You really are not payingattention to my posts. I have answered thiss twice already.

Perhaps but perhaps you are being very judgemental

. It could be that i still didnt get it in spite of reading it (that doesnt mean the fault is at your end either). Your last explanation as far as i could tell was in kannada script "It does not show up right. It should be ವಾಙ್ಮಯ but actually shows up as vAMgmaya ವಾಂಗ್ಮಯ". Without knowing the script much, i wanted to make extra sure. Is that so hard to be sympathetic/compassionate about? Jeez!

Arun

arunk · Post by **arunk** » 28 Nov 2006, 20:42

jayaram wrote:It seems some of the peculiarities of the Malayalam language have not been considered so far.

A bit difficult to explain the sounds thru the written word - for example, in the word 'Nanni' (meaning thanks) the 'nn' is not like the 'Thanni' in Tamil - it is pronounced in a 'flat' way (imagine extending the 'n' in pandal). In fact, the first 'n' in Nanni is also pronounced this way in Malayalam.

And there are a few more like this one...

Perhaps a fellow Malayali can explain this better.

By the way thanni would be thaNNI. But I think this "n" in Nanni you mean in malayAlam is different from both n/N in tamizh i.e. neither like in kannam (cheek), nor as in taNNi (water) right?

We can of course introduce another variant of "na" with a prefix (say %n), but before we decide it, does malAyaLam also use these 2 other na sounds? In malayalam script how many different consonants are there are for na's

Arun

drshrikaanth · Post by **drshrikaanth** » 28 Nov 2006, 21:00

arunk wrote:
drshrikaanth wrote:
arunk wrote:Do you know when a "nga" is used vs. "Mg" is used? Could it be "nga" in vAngmaya because the "ng" is followed by "m"? Would there be a determinstic rule that can be discerned as to when a "ng" becomes "Mg" vs uses the "nga" consonant? If so, i can incorporate that to the logic and atleast in cases like vAngmaya, you wont have to explicitly specify #n.
You really are not payingattention to my posts. I have answered thiss twice already.
Perhaps but perhaps you are being very judgemental . It could be that i still didnt get it in spite of reading it (that doesnt mean the fault is at your end either). Your last explanation as far as i could tell was in kannada script "It does not show up right. It should be ವಾಙ್ಮಯ but actually shows up as vAMgmaya ವಾಂಗ್ಮಯ". Without knowing the script much, i wanted to make extra sure. Is that so hard to be sympathetic/compassionate about? Jeez!

Arun

Arun. Im passionate about what I say. Compassionate to a degree in matters of inattention. There is no question of being judgmental here.If I had been, I would not have taken the pains to write so many posts in response.

AFAIK, I have clearly explained where M(bindu/anuswAra) occurs. I can think of 2 words (vA#nmaya, di#nmUDha) where #n is used and both have "m" following it. But I may remember other words later which dont follow this pattern. Is it so important to incorporate every minor thing as a logic than to provide a separate symbol? You are thinking from the perspective of someone who does not know kannaDa. But as I said earlier, those who wish to write in kannaDa will know how to spell it. They will look for a unique symbol rather than an all-encompassing logic. Am I clear at all?

I suppose you could , if you may, provide a unique symbol as well as incorporate a logic for particular situation (#n followed by m).

drshrikaanth · Post by **drshrikaanth** » 28 Nov 2006, 21:04

arunk wrote:
jayaram wrote:It seems some of the peculiarities of the Malayalam language have not been considered so far.

A bit difficult to explain the sounds thru the written word - for example, in the word 'Nanni' (meaning thanks) the 'nn' is not like the 'Thanni' in Tamil - it is pronounced in a 'flat' way (imagine extending the 'n' in pandal). In fact, the first 'n' in Nanni is also pronounced this way in Malayalam.

And there are a few more like this one...

Perhaps a fellow Malayali can explain this better.
By the way thanni would be thaNNI. But I think this "n" in Nanni you mean in malayAlam is different from both n/N in tamizh i.e. neither like in kannam (cheek), nor as in taNNi (water) right?

We can of course introduce another variant of "na" with a prefix (say %n), but before we decide it, does malAyaLam also use these 2 other na sounds? In malayalam script how many different consonants are there are for na's

Arun

Jayaram, I think is talking about variations in pronunciation that are not represented by different symbols in writing. We shouldnt complicate matters by introducing more symbols/letters than what is present in the actual script of the language. They will not be understood easily either.

No script is 100% phonetic, not even sanskrit. We have different variations even in kannada that result in a different meaning. tande with a flat "a" (ta) means father while tande with tongue elevated higher and touching the teeth means "I brought". This is not represented differently in writing. And even if you did, people wont understand as they are not used to seeing it in writing and most people would not even consciously recognise it as different although they would pronounce correctly being native speakers.

arunk · Post by **arunk** » 28 Nov 2006, 21:12

drshrikaanth wrote:Is it so important to incorporate every minor thing as a logic than to provide a separate symbol?

No. Only if there we can establish a clear rule that covers all cases. I was trying to see if there is such a rule. If there isnt or we cannot think of one now, then there is no need to add logic and we have to deal with the extra symbol to disambiguate.

drshrikaanth wrote:You are thinking from the perspective of someone who does not know kannaDa. But as I said earlier, those who wish to write in kannaDa will know how to spell it. They will look for a unique symbol rather than an all-encompassing logic. Am I clear at all?

Yes - it is clear. But let me explain my side as to why I am being a pain

. As mentioned earlier, i was going for a scheme which was more phonetic based rather than spelling based i.e. where the spelling rules are discerned from the phonetic sound. Now of course I know it cannot be all phonetic based, which is why i said "as much as possible". So every-time a "simple english phonetic combination" on its own cannot adequately capture the target language phonetic sound (like #ng here), or cannot unambiguously point to a spelling (like the tamizh na as in nAmam), i atleast want to ask the question if it can be done based on context in which it appears (e.g. in tamizh there is no need to say ta#nam, you can safely say tangam. It is way better than ta#nam, and also ta#ngam). If it cannot be discerned even from the context, then we obviously must introduce a new symbol. I know these extra symbols must be there, but i want to keep them to a minimum if possible.

Arun

arunk · Post by **arunk** » 28 Nov 2006, 21:14

drshrikaanth wrote:
So would all occurences of "ng" in that tamizh krithi (tangam etc.) become "Mg" or "nga" when going over to kannada?
Yes.

Now I may get it from you again

but...

I asked if it is A or B, and you said "yes".

!. Is it A (mg) or B (nga i.e. #n)?

Arun

arunk · Post by **arunk** » 28 Nov 2006, 21:23

drshrikaanth wrote:We shouldnt complicate matters by introducing more symbols/letters than what is present in the actual script of the language. They will not be understood easily either.

Agreed. If say there is only one "soft" na in malayalam, then no need for transl. txt to require it always to be %n, because "n" is used for a "na" that appears in other languages. That would be a massive pain when transl. malayalam krithis (of course this example can apply to any language depending on the context). This is why I asked jayaram how many "na" letters there are in malayalam.

But if the phonetic sound is different from any of the other languages, then when rendering the target language, we can add super-scripts (i.e. n followed by a tiny, raised 1) in the non-malayalam languages when they are referring to a malayalam krithi(corrected) . Isnt this not different from the super-scripts used in tamizh CM books for kha, bha, etc. (and even ka vs ga etc.)?

Arun

jayaram · Post by **jayaram** » 28 Nov 2006, 21:41

arunk wrote:
jayaram wrote:It seems some of the peculiarities of the Malayalam language have not been considered so far.

A bit difficult to explain the sounds thru the written word - for example, in the word 'Nanni' (meaning thanks) the 'nn' is not like the 'Thanni' in Tamil - it is pronounced in a 'flat' way (imagine extending the 'n' in pandal). In fact, the first 'n' in Nanni is also pronounced this way in Malayalam.

And there are a few more like this one...

Perhaps a fellow Malayali can explain this better.
By the way thanni would be thaNNI. But I think this "n" in Nanni you mean in malayAlam is different from both n/N in tamizh i.e. neither like in kannam (cheek), nor as in taNNi (water) right?

We can of course introduce another variant of "na" with a prefix (say %n), but before we decide it, does malAyaLam also use these 2 other na sounds? In malayalam script how many different consonants are there are for na's

Arun

Arun - Malayalam has 'na' as in Tamil 'nAn', 'Na' as in Tamil 'vANi' - plus the one I mentioned (let's show it as 'nÂ¬a' for now).
Most non-Mallus find it hard to pronounce this one, for obvious reasons. (my dad could never get this sound right, even though he lived in Kerala most of his life - he would say niNakku instead of nÂ¬inakku - meaning 'for you')

In the word 'vanam' (meaning forest) the 'na' is just like the Tamil 'na'. But in 'nanni' both the na-s are flat versions, the 2nd one being stressed of course. In fact, whenever the 'na' appears as the first syllable, it behaves this way, i.e. as nÂ¬.

Having said all this, DRS's point about not getting too much hung up on phonetics may be right. When I went thru this thread, I thought we were trying to come up with a system to reproduce the exact sounds in the native language. Perhaps I was wrong.

arunk · Post by **arunk** » 28 Nov 2006, 22:07

jayaram wrote:Arun - Malayalam has 'na' as in Tamil 'nAn', 'Na' as in Tamil 'vANi' - plus the one I mentioned.
Most non-Mallus find it hard to pronounce this one, for obvious reasons.

Can you pl. look at http://unicode.org/charts/PDF/U0D00.pdf page #3 and tell me which letter stands for these (i.e. the unicode constants). I see a na, nga, nna, nya. I am assuming nga, nna, nya are like in tamizh in which case, i cant find a separate letter for this other na (assuming first na is like nAn). If it doesnt really use a separate letter, then we just stick to "n" (but see below)

jayaram wrote:Having said all this, DRS's point about not getting too much hung up on phonetics may be right. When I went thru this thread, I thought we were trying to come up with a system to reproduce the exact sounds in the native language. Perhaps I was wrong?

I at least want to be able to indicate these sounds in the form of super-scripts. We want to translate a CM krithi in any of the 5 languages, to all 4 other languages. I think it would better when this is done, we dont loose any pronounciation for the 4 other languages also. As I said in books in tamizh on telugu/sanskrit krithis, this is handled with super-scripts for bha, dha etc. I was thinking that this is like that (i.e. a nanni in tamizh would show up like nan4ni (imagine 4 is smaller and raised), assuming we decide n4 is the special malayalam n, .

Arun

jayaram · Post by **jayaram** » 28 Nov 2006, 23:15

Arun - this is fascinating stuff! I didn't know much about this world of unicode before! I have always had an interest in linguistics (the tyranny thread notwithstanding

) and find this attempt to come up with a system to 'transliterate' from one of the 4 south indian language (plus Sanskrit) to each other fascinating!

To address your question, I had a look at the unicode pdf file - and feel there's definitely a missing entry there. This is for the 'nn' in 'nanni'. As I said before, it's not the regular 'na' or the 'Na'. This is part of the alphabet that we all grew up with (and is used quite regularly, e.g. 'innale' for 'yesterday' and so on).

The 'na' gets pronounced as the Tamil 'na' when it's not the beginning syllable, but is pronounced as the 'nÂ¬' I mentioned before when it's at the start of a word. So a Mallu would pronounce the word 'nilAvu' (really 'nilAv<half-u>' using OD4D sign in the unicode file) as 'nÂ¬ilAvu' while a Tamilian would say 'nilAvu'. But then that gets into the phonetics aspect of it.

drshrikaanth · Post by **drshrikaanth** » 28 Nov 2006, 23:21

arunk wrote:
drshrikaanth wrote:
So would all occurences of "ng" in that tamizh krithi (tangam etc.) become "Mg" or "nga" when going over to kannada?
Yes.
Now I may get it from you again but...

I asked if it is A or B, and you said "yes". !. Is it A (mg) or B (nga i.e. #n)?

Arun

You call me judgmental and you crib about getting it from me

It is A(Mg). Unless of course a#n#nanam occurs, then you will have to use the letter as such and not the bindu/anuswAra. And although the tamizh dictionary has words beginning or ending with #n I think we can say with confidence that they will never occur in kRtis.

drshrikaanth · Post by **drshrikaanth** » 28 Nov 2006, 23:30

Yes - it is clear. But let me explain my side as to why I am being a pain . As mentioned earlier, i was going for a scheme which was more phonetic based rather than spelling based i.e. where the spelling rules are discerned from the phonetic sound. Now of course I know it cannot be all phonetic based, which is why i said "as much as possible". So every-time a "simple english phonetic combination" on its own cannot adequately capture the target language phonetic sound (like #ng here), or cannot unambiguously point to a spelling (like the tamizh na as in nAmam), i atleast want to ask the question if it can be done based on context in which it appears (e.g. in tamizh there is no need to say ta#nam, you can safely say tangam. It is way better than ta#nam, and also ta#ngam). If it cannot be discerned even from the context, then we obviously must introduce a new symbol. I know these extra symbols must be there, but i want to keep them to a minimum if possible.

Arun

Arun. Are we talking at cross purposes here? If your aim is only to go for a phonetic based spelling(as much as possible) in English, then you might give up on the other languages. I said vA#nmaya is not being shown correctly in kannaDa. Are you trying to simplify kannaDa spellings! :rolleyes: There is no way you can replace #n ins vA#nmaya with bindu in kannaDa because that will fail to represent the correct pronunciation miserably. Now I dont care whether you write ta#ngam or tangam in English when and as long as I have the accurate kannaDa script to hand.

ramakriya · Post by **ramakriya** » 28 Nov 2006, 23:32

jayaram wrote:The 'na' gets pronounced as the Tamil 'na' when it's not the beginning syllable, but is pronounced as the 'nÂ¬' I mentioned before when it's at the start of a word. So a Mallu would pronounce the word 'nilAvu' (really 'nilAv<half-u>' using OD4D sign in the unicode file) as 'nÂ¬ilAvu' while a Tamilian would say 'nilAvu'. But then that gets into the phonetics aspect of it.

My question is there is single representation for these two sounds in the existing malayALam script, wouldn't it be correct to use the same character in here as well ?

-Ramakriya

drshrikaanth · Post by **drshrikaanth** » 28 Nov 2006, 23:33

And #n and ~n sounds will assume even more importance when you incorporate malayalam as well. Words like i#n#nane, ~nAn, ~na#n#naL will appear everywhere. You cannot use bindu/anuswAra there. You cannot do without a separate symbol as far as I can see.

arunk · Post by **arunk** » 28 Nov 2006, 23:36

drs,

i know the vAngmaya problem - it is still a problem and I have not forgotten it. In fact, i am currently trying to fix it. In my "new not yet posted version" here is what I get:

English:
vA#ngmaya, ta#ngam, a#n#naNam, tangam

Tamil:
வாங்க்மய, தங்கம,் அங்ஙணம், தங்கம

Kannada:
ವಾಙ್ಮಯ, ತಙಮ್, ಅಙ್ಙಣಮ್, ತಂಗಂ

Is this ok? Note that to force use #n in tangam (as in tamizh).

Arun

ramakriya · Post by **ramakriya** » 28 Nov 2006, 23:38

Arun

Kannada:
ವಾಙ್ಮಯ-> OK
ತಙಂ-> incorrect
அங்ஙணம், ಅಙ್ಙಣಮ್ -> My tamil vocabulary is not good enough to recognize this word !
ತಂಗಂ -> OK

-Ramakriya

arunk · Post by **arunk** » 28 Nov 2006, 23:40

drshrikaanth wrote:And #n and ~n sounds will assume even more importance when you incorporate malayalam as well. Words like i#n#nane, ~nAn, ~na#n#naL will appear everywhere. You cannot use bindu/anuswAra there. You cannot do without a separate symbol as far as I can see.

Yes we have to use separate symbols for these. About phonetic, as I said, if the english combination cannot even capture the sound then we must use separate symbols unless they appear in well-defined limited contexts. For ~n, and #n it is not so (and of course i didnt know that and was slow in getting convinced

) and hence we have to use them.

Arun

arunk · Post by **arunk** » 28 Nov 2006, 23:43

ramakriya wrote:Arun

Kannada:
ವಾಙ್ಮಯ-> OK
ತಙಂ-> incorrect
அங்ஙணம், ಅಙ್ಙಣಮ್ -> My tamil vocabulary is not good enough to recognize this word !
ತಂಗಂ -> OK

-Ramakriya

Ok. I think 2nd is indeed wrong but that is because i dont have "source language" knowledge built-in. In other words, if we establish tangam is a tamizh word (meaning gold of course), then (per my desire) tangam and ta#ngam are equivalent and both should become taMgam in kannda (but only because original language is tamizh) . But there is no concept of "original language" yet. So it is coming out wrong.

Arun

drshrikaanth · Post by **drshrikaanth** » 28 Nov 2006, 23:43

Arun- The correct tamizh spelling is a#n#nanam அங்ஙனம் not a#n#naNam அங்ஙணம். vA#nmaya in tamizh is incorrect. "k" should not be there. It should be வாங்மய

arunk · Post by **arunk** » 29 Nov 2006, 01:30

drshrikaanth wrote:Arun- The correct tamizh spelling is a#n#nanam அங்ஙனம் not a#n#naNam அங்ஙணம். vA#nmaya in tamizh is incorrect. "k" should not be there. It should be வாங்மய

doh! Ok. I am indeed lacking in paying attention :rolleyes:! You are/were perfectly justified in chiding me about thiis.

What I am currently going for (which may not be possible) is to not require #n in tamizh except in the rare cases like அங்ஙனம், and that using "#n" vs "n" in the other cases is equivalent (i.e. tangam <=> ta#ngam). What i am trying now is the tamizh specific interpretor:
1. If ng/#ng is followed by vowel extension => ங + க் (where க் gets modified based on vowel extension). This helps cases like ta#ngam/tangam and sa#ngu/sangu. Note that here # is not needed.
2. Else if #n (not followed by g) then use ங் (again gets modified by vowel after it). This would be in a#n#nanam.

So if I see #n (or just n), i see if g follows it, and if so I see if a vowel follows it. If rule #1 applies. Else rule 2.

Now the trouble again is that i would like both ta#ngam and tangam to be like taMgam kannada (ತಂಗಂ). I have currently these rules in kannada.
1. If #n (or #ng) then use ಙ್. This explicitly forces this consonant.
2. If n is not preceded by #, then if it is followed by g (as in tangam), then make do the anuswara thing (i.e. becomes M)

The problem is if we want #n to force ங in tamizh, ಙ್ in kannada, then an explicit usage in tamizh along with g as in #ng (ta#ngam) would come out wrong in kannada (it becomes ತಙಂ, rather than ತಂಗಂ).

I think a possible solution is to use #n in tamizh only in cases where it is NOT succeeded by g. So it mostly is not required in tamizh, except the rare cases like அங்ஙணம. (If I am not mistaken) So besides the rare cases, in tamizh krithis it will always occur with g as in "ng", which becomes ஙக் and also it can be trancoded to Mg in kannada.

For non-tamizh words, it will happen in cases like va#ngmaya and #ng itself together becomes ங்.

To summarize, for tamizh #n/#ng is ங், and "ng" is ஙக்
For kannada (and similar), #n/#ng is ಙ್, and "ng" is Mg (i.e. anuswara)

Is this ok? Or is there perhaps a simpler solution?

Arun

jayaram · Post by **jayaram** » 29 Nov 2006, 01:41

ramakriya wrote:
jayaram wrote:The 'na' gets pronounced as the Tamil 'na' when it's not the beginning syllable, but is pronounced as the 'nÂ¬' I mentioned before when it's at the start of a word. So a Mallu would pronounce the word 'nilAvu' (really 'nilAv<half-u>' using OD4D sign in the unicode file) as 'nÂ¬ilAvu' while a Tamilian would say 'nilAvu'. But then that gets into the phonetics aspect of it.
My question is there is single representation for these two sounds in the existing malayALam script, wouldn't it be correct to use the same character in here as well ?

I agree - which is why I ended with my comment about phonetics. Having said that, if the intention of this whole exercise is to represent how a kriti 'sounds' in its original form, perhaps this rule should be captured too?

Also, the point I mention in the previous para of my posting is valid even for the canonical case. The 'nn' in 'nanni' is a regular letter in the Malayalam alphabet, but hasn't been captured in the unicode list as far as I can see.