Single transliteration scheme for all CM languages?

arunk · Post by **arunk** » 23 Nov 2006, 03:05

In this topic, let us discusss if we can come up with a single transliteration scheme that can be used by a program to translate into all 5 CM languages: kannada, malayalam, sanskrit, telugu and tamil.

I dont know if this is possible, but i atleast want to find out.

Note that we want to definitely base this on existing schemes which are already pretty close to each other. My current expectation is that language specific rules may force us to introduce special symbols into the scheme. My hope is this wont become a can of worms - but if it happens so be it (and we will drop this idea

).

Arun

ramakriya · Post by **ramakriya** » 23 Nov 2006, 03:17

Are you proposing a new engine or want to do a wrapper around an existing application like baraha?

-Ramakriya

arunk · Post by **arunk** » 23 Nov 2006, 03:17

To start, we can list some language specific rules which are not readily apparent from the transliterated text.

For tamil, a couple of ones i can readily think of.

1. There are two possible tamizh letters for "na"'s. Now one of them cannot appear at the beginning of the word. But the other one can appear anywhere. For example - the transliterated text mAnam, nAmam and tirunAmam. The first one takes one type of na, and the last 2 take the other kind. From the transl. text this is not obvious, nor is it "easily deducible" from any language specific info (correct?). How to indicate this in the transliterated text in a way that translators of other language "can ignore"?

2. Requires mei-ezhuttu in places. For example: pArthasArati should be written as pA r t ta sA ra ti The emboldened letter is required - otherwise the ta that follows would take the soft da sound - wrong in this case. Obviously this is not needed in languages such as sanskrit.

It may be possible for a program to be smart enough to figure #2 out - for example from the transl. scheme we know (a) thA is the hard sound (b) it is the middle of a word. Per written tamizh rules, a hard sound for "ta" in the middle requires a mei-ezhuttu before it. But I am not sure if this can be deduced reliably in all cases. But maybe if we have all the use-cases, perhaps we can find out if we can indeed deduce that reliably.

Arun

arunk · Post by **arunk** » 23 Nov 2006, 03:18

ramakriya wrote:Are you proposing a new engine or want to do a wrapper around an existing application like baraha?

-Ramakriya

I dont know

. But if we can indeed come up with a scheme, any application can use it I think. Existing ones or new ones.

arunk · Post by **arunk** » 23 Nov 2006, 03:37

i was testing baraha. Nice but again its scheme is specific to languages. The problem would be that schemes would diverge.

What i am looking for the transl. scheme to phonetically correct (mostly). So in tamizh, i should be able to enter tangamalai (or thangamalai) and it should render ta ng ka ma lai. Braha renders it as ta n ka ma lai. Of course not its fault as i should have entered it as ta~gkamalai for it to render correctly - but the transl. text is no longer phonetically meaningful.

Similarly i should be able to get away with tanjAvUr but instead in baraha it should be t~jcAvUr

Arun

arunk · Post by **arunk** » 25 Nov 2006, 01:28

ramakriya, others

i am in the process of putting up a test page which "supports" a (mostly) phonetic transl. scheme. You type in text in english per that scheme, and it will render it in tamizh and kannada (2 languages at this point). I should be able to make this available in a day or so.

It is of course work in progress and pl. be forewarned it will have bugs. Since the only script (besides english

) that I know is tamizh, the bugs would probably be worse for other languages. But the goal is to see if we can come up with a way to render compositions in all cm languages easily from a single source. I am hoping this "test bed" will allow us to find out how close we can get to this goal.

I need some information regarding kannada:
1. Is there a comprehensive list of combinations which use the anuswara? I added the ones you gave to the "intelligence" behind the page and it works (so you can enter candra - and it will come out right in tamizh and kannada), but i want to know what the full list is
2. What are the other special cases? The visarga - will it figure in our scenario (i.e. cm context)? Same for candrabindu, anudatta, udatta, swarita
3. What is the transl. text in baraha for gnya (or nya) as in gnyAna?

I want to add support for telugu next (and then sanskrit). Are the rules for anuswara in telugu same as in kannada? How about chandrabindu? It looks like it may figure more in telugu? Does anyone know in which contexts this would figure?

Thanks
Arun

ramakriya · Post by **ramakriya** » 25 Nov 2006, 01:58

arunk wrote:ramakriya, others

i am in the process of putting up a test page which "supports" a (mostly) phonetic transl. scheme. You type in text in english per that scheme, and it will render it in tamizh and kannada (2 languages at this point). I should be able to make this available in a day or so.

That is great news indeed!

arunk wrote:I need some information regarding kannada:
1. Is there a comprehensive list of combinations which use the anuswara? I added the ones you gave to the "intelligence" behind the page and it works (so you can enter candra - and it will come out right in tamizh and kannada), but i want to know what the full list is

Simply put, all anunAsika sounds occuring in the middle of a word are represented by anuswAra. In normal English transliteration, you woud see it represented by 'n' - if the letter following the anunAsika beliongs to K, C,T or t vargas ( as in ganga, candra, pance, banTa, ganDa, henDati etc ) or 'm' if the lette following the anunAsika sound belongs to the p varga (gumpu, impu, kamba, kadamba etc)

You can think of any word in tamizh that uses the half- vargAntya letters ( there should be a better term for this! I do not know) in a word (not at the beginning), that would be a anuswAra, and represented by a '0';

arunk wrote:2. What are the other special cases? The visarga - will it figure in our scenario (i.e. cm context)? Same for candrabindu, anudatta, udatta, swarita

I do not quite follow the question about anudatta, udatta and swarita . The visarga will figure when the sAhitya is in samskrita or in words like duHkha (which are sama samskrita words used in kannaDa).

arunk wrote:3. What is the transl. text in baraha for gnya (or nya) as in gnyAna?
Arun

in baraha you would write it as j~JAna, to match how it is written in kannada
the cha varga representation is

c/ch C/Ch j J ~J

-Ramakriya

drshrikaanth · Post by **drshrikaanth** » 25 Nov 2006, 02:13

arunk wrote:I need some information regarding kannada:
1. Is there a comprehensive list of combinations which use the anuswara? I added the ones you gave to the "intelligence" behind the page and it works (so you can enter candra - and it will come out right in tamizh and kannada), but i want to know what the full list is

In all sajAtIya combinations/conjuncts i.e when the last consonant( nasal) of a pentet is combined with any of the other four consonants that follow it, the anuswAra is Always used. No exceptions.

However when these nasal consonants are combined with any consonant from elsewhere (Not within the pentet), the anuswAra is Never used e.g vAngmaya, j~nAna etc. Just for you info- a few words have traditionally been simplified in spellings to sajAtIya conjuncts when they involve "N" (nasal consonant of the thrid pentet). kaNgaLu to kangaLu, heNgaLeyaru to hengaLeyaru. This will however not cause any room for confusion as it is clear from the song what is used.

2. What are the other special cases? The visarga - will it figure in our scenario (i.e. cm context)? Same for candrabindu, anudatta, udatta, swarita

visarga is important- you will have words like duHkha and antaHkaraNa occurring often in songs in kannaDa. This has to be represented uniquely. Capital "H" is what is used and can be retained as such.

candrabindu does not occur at all in kannaDa. In telugu, it does occur occasionally althought it has fallen almost completely into disuse in the present day.

I want to add support for telugu next (and then sanskrit). Are the rules for anuswara in telugu same as in kannada?

Yes.

drshrikaanth · Post by **drshrikaanth** » 25 Nov 2006, 02:13

arunk wrote:I need some information regarding kannada:
1. Is there a comprehensive list of combinations which use the anuswara? I added the ones you gave to the "intelligence" behind the page and it works (so you can enter candra - and it will come out right in tamizh and kannada), but i want to know what the full list is

In all sajAtIya combinations/conjuncts i.e when the last consonant( nasal) of a pentet is combined with any of the other four consonants that follow it, the bindu is Always used. No exceptions.

However when these nasal consonants are combined with any consonant from elsewhere (Not within the pentet), the bindu is Never used e.g vAngmaya, j~nAna etc. Just for you info- a few words have traditionally been simplified in spellings to sajAtIya conjuncts when they involve "N" (nasal consonant of the thrid pentet). kaNgaLu to kangaLu, heNgaLeyaru to hengaLeyaru. This will however not cause any room for confusion as it is clear from the song what is used.

2. What are the other special cases? The visarga - will it figure in our scenario (i.e. cm context)? Same for candrabindu, anudatta, udatta, swarita

visarga is important- you will have words like duHkha and antaHkaraNa occurring often in songs in kannaDa. This has to be represented uniquely. Capital "H" is what is used and can be retained as such.

candrabindu does not occur at all in kannaDa. In telugu, it does occur occasionally althought it has fallen almost completely into disuse in the present day.

I want to add support for telugu next (and then sanskrit). Are the rules for anuswara in telugu same as in kannada?

Yes.

arunk · Post by **arunk** » 25 Nov 2006, 02:13

ok. thanks. W.r.t anuswarA, your page on bhajarE had them for all words that end with "m". That is correct right?

Regarding that anudatta etc. I ask only because baraha had them listed in the kannada transl. rules help page

. For visarga, then is it possible to come up with rule(s) like anuswara so that i can write the logic for it?

Can the ~j (which in some unicode maps are referred to as the "nya" consonant) occur in other contexts or is it special as in occurs after "ja" (and may be others)?

Also is "Shri" a separate symbol like in tamil or is it a combinatory thing like the above one?

Thanks!
Arun

drshrikaanth · Post by **drshrikaanth** » 25 Nov 2006, 02:17

The nasal consonants of the first two pentets never occur at the end of words in kannaDa. The other three can occur at the end of words in classical literature, "N" will be represented as such as also "n". "m" is represented as bindu. But in hosagannaDamodern kannaDa) no consonant can occur at the end of words. SO words ending in consonants are not very frequent in kannaDa kRtis/songs. You will get them in antaHpura gItes, gItagOpala etc.

arunk · Post by **arunk** » 25 Nov 2006, 02:18

thanks. drs.

Our posts crossed. So visarga occurs in "Hk" combinations (tamizh in this case can treat it like to kk). Are there other combinations?

Arun

drshrikaanth · Post by **drshrikaanth** » 25 Nov 2006, 02:21

arunk wrote:thanks. drs.

Our posts crossed. So visarga occurs in "Hk" combinations (tamizh in this case can treat it like to kk). Are there other combinations?

Arun

Yes. manaHpUrva. It occurs with k and p. In combination with c or T, it gets converted to "Sh".

SrI is stylishly represented in writing and print but not always. It does not have to be. SO just stick to the standard.

arunk · Post by **arunk** » 25 Nov 2006, 02:23

drshrikaanth wrote:The nasal consonants of the first two pentets never occur at the end of words in kannaDa. The other three can occur at the end of words in classical literature, "N" will be represented as such as also "n". "m" is represented as bindu. But in hosagannaDamodern kannaDa) no consonant can occur at the end of words. SO words ending in consonants are not very frequent in kannaDa kRtis/songs. You will get them in antaHpura gItes, gItagOpala etc.

Ok. But when we are doing kannada notations for sanskrit krithis (which of couse if part of the intended goal), do we need to put the anuswara at the end of such words or not?

Actually this raises an interesting and perhaps controversial issue

If the original krithi is in a differerent language, should(nt?) pronounciation should follow that language? So a "nandhi" in kannada,telugu krithi should perhaps be rendered as "namdi" in tamizh (and with anuswara in kannada, telugu). If the same happens in GB tamizh composition, should it be rendered as nandi in kannada, telugu?

BTW, I need to step out now.

Arun

arunk · Post by **arunk** » 25 Nov 2006, 02:24

drshrikaanth wrote:SrI is stylishly represented in writing and print but not always. It does not have to be. SO just stick to the standard.

Pardon my ignorance but what is the standard. I am not able to find the unicode character for it in tamizh (and kannada) which is why I asked. But perhaps I need to look harder

Arun

ramakriya · Post by **ramakriya** » 25 Nov 2006, 02:29

arunk wrote:ok. thanks. W.r.t anuswarA, your page on bhajarE had them for all words that end with "m". That is correct right?

Arun

That kriti is in dvitIya vibhakti, hence all the words end in M, and it is correct.

-Ramakriya

ramakriya · Post by **ramakriya** » 25 Nov 2006, 02:31

arunk wrote:
drshrikaanth wrote:SrI is stylishly represented in writing and print but not always. It does not have to be. SO just stick to the standard.
Pardon my ignorance but what is the standard. I am not able to find the unicode character for it in tamizh (and kannada) which is why I asked. But perhaps I need to look harder

Arun

Yes, I noticed that in the tamizh version, the shrI is missing ;

In kannada, there is no seperate character for shrI; It is to be treated excatlt like any other samyuktAkshara.

-Ramakriya

drshrikaanth · Post by **drshrikaanth** » 25 Nov 2006, 02:31

arunk wrote:Ok. But when we are doing kannada notations for sanskrit krithis (which of couse if part of the intended goal), do we need to put the anuswara at the end of such words or not?

kannaDa spelling is simplistic. It does NOT compromise on correct pronunciation. sanskrit when written in kannaDa script. follows kannaDa spelling. This does not in any way affect pronunciation.

Actually this raises an interesting and perhaps controversial issue If the original krithi is in a differerent language, should(nt?) pronounciation should follow that language? So a "nandhi" in kannada,telugu krithi should perhaps be rendered as "namdi" in tamizh (and with anuswara in kannada, telugu). If the same happens in GB tamizh composition, should it be rendered as nandi in kannada, telugu?

I have answered this question above. As discussed earlier, we are lookking at represnting pronunciation/sounds correctly, not the spelling in the original language.

drshrikaanth · Post by **drshrikaanth** » 25 Nov 2006, 02:34

ramakriya wrote:In kannada, there is no seperate character for shrI; It is to be treated excatlt like any other samyuktAkshara.

There is Ramakriya. The guNisu (i), dIrgha and the vatru suLi (ra ottu) are combined in writing and circled around "S" all in one stroke.

ramakriya · Post by **ramakriya** » 25 Nov 2006, 02:42

arunk wrote:Actually this raises an interesting and perhaps controversial issue If the original krithi is in a differerent language, should(nt?) pronounciation should follow that language? So a "nandhi" in kannada,telugu krithi should perhaps be rendered as "namdi" in tamizh (and with anuswara in kannada, telugu). If the same happens in GB tamizh composition, should it be rendered as nandi in kannada, telugu?

Arun

arun,

I think you missed a point there - The anuswara in kannaDa (or telugu, AFAIK) is an overloaded operator, and takes the sound of different vargAnta letter based on what letter follows it.

So, nandi sounds as nandi, even though you see it written as naMdi;

-Ramakriya

ramakriya · Post by **ramakriya** » 25 Nov 2006, 04:02

drshrikaanth wrote:
ramakriya wrote:In kannada, there is no seperate character for shrI; It is to be treated excatlt like any other samyuktAkshara.
There is Ramakriya. The guNisu (i), dIrgha and the vatru suLi (ra ottu) are combined in writing and circled around "S" all in one stroke.

That is one way ; But it is not a must to combine the guNisu, dIrgha and the ra ottu though; That's what I meant.

-Ramakriya

arunk · Post by **arunk** » 25 Nov 2006, 08:30

ok. i thought the anuswara changed the pronounciation. Since it doesnt, my questions are moot

Arun

vgvindan · Post by **vgvindan** » 25 Nov 2006, 09:59

It is generally felt that Sanskrit cannot represent short 'e' and short 'o'; this is not correct. As per unicode protocol implemented in the Mangal Script, there are two characters

ऎ - this is (short) 'e' (note the curl)
ए - this is (Long) 'E'
ऐ - this is 'ai'
ऒ - this is (short) 'o' (note the curl)
ओ - this is (Long) 'O'
औ - this is 'au'

Similarly it is believed that only one 'consonant' 'k' etc can be represented in Tamil. Unicode caters for 4 slots for each such consonant. The present implementation reproduces only same character for all the four key sequences

क - க - key board character 'k'
ख - க - key board character Shift + 'k'
ग - க - key board character 'i'
घ - க - key board character Shift + 'i'

....similarly for other consonants

A version prepared in Devanagari can be transliterated with proper additional notation in Tamil to indicate these characters - like க2 க3 க4. Alternatively, if a proper case is taken up, the vacant slots could be filled by character modification to represent devanagari characters.

This needs deliberation at the highest level if there is to be any meaningful transliteration of devanagari texts into Tamil.

drshrikaanth · Post by **drshrikaanth** » 25 Nov 2006, 15:15

ramakriya wrote:That is one way ; But it is not a must to combine the guNisu, dIrgha and the ra ottu though; That's what I meant.

-Ramakriya

I have indicated this clearly in my post.

arunk · Post by **arunk** » 28 Nov 2006, 01:19

Folks,

I have created a web-page that is a "test bed" for helping us come up with 1 transliterated text source from which we can render in all 5 CM languages. Please try it out at: http://arunk.freepgs.com/cmtranslit/cmt ... _test.html

Of course this is very preliminary and I am not sure if it even gets the "example" right! I know there are things missing in tamizh. For kannada, I am sure it misses more

. Please let me know the shortcomings and I will fix them.

It is important to note that the transliteration scheme in my mind can remain simple and user friendly only if it is interpreted by some programming logic which applies language specific rules. Some of these are actually part of the system itself (e.g. in tamizh you generate unicode for "S" and "ri" and the font support in the system/OS automatically combines them to the "sri" - atleast this is so on Windows XP). But the system doesnt handle all rules - i dont know why.

In that absence, i have some logic built behind the web-page to apply additional interpretation. The logic isnt very complicated from a programming standpoint and so any editor (e.g.like baraha) can accomplish the same. But these additional rules need to be enumerated and clearly established.

Thanks
Arun

vasanthakokilam · Post by **vasanthakokilam** » 28 Nov 2006, 01:30

Arun: I tried it.. Wow, it works great for the few things I tried. Great job. Thanks. It was a pleasure to work this thing. The immediate feedback in the target scripts makes it very attractive to use.

By the way, how do I transliterate the 'ou' sound so it shows the right Tamil letter 'OU' ( the one that follows o and O as in mouse ).

arunk · Post by **arunk** » 28 Nov 2006, 01:34

it is "au". I think that is what most schemes used (but i can probably also accept "ou")

arunk · Post by **arunk** » 28 Nov 2006, 01:35

actually i wanted to accept "aa" for "A", "ee" for "I", etc. (to go with the "more phonetic" thing) also but i was in two minds

. Thanks for the compliment!

ramakriya · Post by **ramakriya** » 28 Nov 2006, 01:40

Great job Arun!

Found some things missing;

how to show visarga ? I can't get it right.

SamyuktAksharas with ra are not treated correctly in kannaDa (eg: prasidda) -> This is coming with a 'Ru' kAra.

I will post other findings as and when I find them

-Ramakriya

arunk · Post by **arunk** » 28 Nov 2006, 01:42

ramakriya wrote:how to show visarga ? I can't get it right.

not yet implemented. Sorry - i should have noted that.

ramakriya wrote:SamyuktAksharas with ra are not treated correctly in kannaDa (eg: prasidda) -> This is coming with a 'Ru' kAra.

Can you pl. explain it in simpler terms (with examples)?

Arun

arunk · Post by **arunk** » 28 Nov 2006, 01:51

drs/ramakriya,

regarding visarga - does the presence does indicate a specific pronounciation as in manapUrva vs manaHpUrva would be pronounced differently? I am asking this based on info on http://en.wikipedia.org/wiki/Visargaa - i am not sure if this is the interpretation(its a bit heavy for me in terms of usage of linguistic terms).

Thanks
Arun

drshrikaanth · Post by **drshrikaanth** » 28 Nov 2006, 01:56

Arunk- good work. Some points

1- pra, kra etc are not shown correctly as already pointed

2- The nasal consonants of the first 2 pentets ங், ஞ, how are they written? It is important to have this right for words like vAngmaya

3- Visarga not represented

4- The bindu is a little problematic. Both capital "M" and small "m" are recognised as bindu but the moment it is followed by any vowel or consonant except "t, th, d, dh" it becomes m. This is problematic for words such as sambhaca, samlApa etc where the bindu should occur.

5- The vowel "R" is not being recognised as it also represents the vallinam "R" in tamizh which is how it is being intrpreted as default. SO there does not appear to represent the vowel either by itself or in combination with consonants as in words like kRpe etc.

6- In tamizh, the combination of "nn" is automatically being recognised as Rannagaram(last consonant). This is not always the case as in words like nanneRi, sennAppOdAr etc. Use N2 to represent one of them (say ந்) to avoid confusion. This is how many conventions differentiate between the two.

drshrikaanth · Post by **drshrikaanth** » 28 Nov 2006, 01:58

arunk wrote:drs/ramakriya,

regarding visarga - does the presence does indicate a specific pronounciation as in manapUrva vs manaHpUrva would be pronounced differently? I am asking this based on info on http://en.wikipedia.org/wiki/Visargaa - i am not sure if this is the interpretation(its a bit heavy for me in terms of usage of linguistic terms).

Thanks
Arun

manaHpUrva is pronounced much like manahpUrva, never as manappUrva as in tamizh. My observation is that the visarga is deeper from the throat than "h".

vasanthakokilam · Post by **vasanthakokilam** » 28 Nov 2006, 01:59

arunk wrote:it is "au". I think that is what most schemes used (but i can probably also accept "ou")

Thanks Arun.

One immediate benefit I found is, I pasted the tamil text from your tool to google and it found the articles with that tamil word ( and displayed it in tamil script, though not unexpected, it is a nice bonus ). I think this will bring out many pages that are not otherwise easy to search for.

drshrikaanth · Post by **drshrikaanth** » 28 Nov 2006, 02:05

I figured out how write the nasal consonant of the first pentet. Its is gn. That again throws a problem. bhagna is not being shown correctly in kannada or in tamizh.

arunk · Post by **arunk** » 28 Nov 2006, 02:11

Thanks drs.

drshrikaanth wrote:1- pra, kra etc are not shown correctly as already pointed

I am sorry. Since i dont know these other languages well - i need a very basic explanation.

2- The nasal consonants of the first 2 pentets ங், ஞ, how are they written? It is important to have this right for words like vAngmaya

ங is "ng" (i.e. phonetic).

ஞ is "gn" (or "gny") - again I went with phonetic although it is weaker here.

I know for tamizh, you just enter vAngmaya. Whether it does correctly for kannada now, i dont know. If it doesnt, let me know how it should render and i will fix the kannada specific logic.

4- The bindu is a little problematic. Both capital "M" and small "m" are recognised as bindu but the moment it is followed by any vowel or consonant except "t, th, d, dh" it becomes m. This is problematic for words such as sambhaca, samlApa etc where the bindu should occur.

Pardon my gross ignorance but what is the bindu? The logic for M/m was for anuswara - obviously needs adjustment. I am in many cases being lenient on upper/lower case - perhaps i shouldnt.

5- The vowel "R" is not being recognised as it also represents the vallinam "R" in tamizh which is how it is being intrpreted as default. SO there does not appear to represent the vowel either by itself or in combination with consonants as in words like kRpe etc.

Yes. I put support for 'hr' (as in hrdaya) but didnt do all. By the way would it matter between krpa vs kRpa?

6- In tamizh, the combination of "nn" is automatically being recognised as Rannagaram(last consonant). This is not always the case as in words like nanneRi, sennAppOdAr etc. Use N2 to represent one of them (say ந்) to avoid confusion. This is how many conventions differentiate between the two.

Yes tamizh "na" is still a problem. I dont know how to deal with it yet. This may be one case where language specific artifiact is unavoidable.

I was thinking of some thing like (a weak proposal):
tiruT~nAmam, where ... is language specific, and first letter inside identifies language. Language specific interpreters will skip all of ... if the first letter doesnt match them. So this can translate to tiru~nAmam in tamizh and tirunAmam in kannada.

The trouble is the extra characters are pretty intrusive

. I am open to other suggestions. If there are precise rules in tamizh for the two "na"s then we can incorporate that into the "behind the scenes" logic. But if there arent we are stuck with something like this.

Arun

arunk · Post by **arunk** » 28 Nov 2006, 02:12

drshrikaanth wrote:I figured out how write the nasal consonant of the first pentet. Its is gn. That again throws a problem. bhagna is not being shown correctly in kannada or in tamizh.

. Yep. How about just gnya? Of course we could go with ~gna (or ~ga)

Arun

drshrikaanth · Post by **drshrikaanth** » 28 Nov 2006, 02:14

In tamizh, "jha" is represented as ஜ்ஹ "j" followed by "ha" while in all other cases, there is no difference in alpaprANa and mahAprANa (Which is as things should be). Is this how it is meant to be? I cannot see why.

In your example(bhajarE), Subha is wrong in the kannaDa text. It is Subha in my scheme but in your scheme(at leats on that page, it should be shubha).

arunk · Post by **arunk** » 28 Nov 2006, 02:18

drshrikaanth wrote:
arunk wrote:drs/ramakriya,

regarding visarga - does the presence does indicate a specific pronounciation as in manapUrva vs manaHpUrva would be pronounced differently? I am asking this based on info on http://en.wikipedia.org/wiki/Visargaa - i am not sure if this is the interpretation(its a bit heavy for me in terms of usage of linguistic terms).

Thanks
Arun
manaHpUrva is pronounced much like manahpUrva, never as manappUrva as in tamizh. My observation is that the visarga is deeper from the throat than "h".

Ok. Then we will allow for this to be specified explicitly with "H" (i.e. as opposed to trying to figure it out based on context). The tamizh one for now in this case would make it as manappUrva but I am going to (eventually) add support for qualifiers in the form of super-script numbers to generated tamizh (such as k1 , k2) for sounds in other languages which the script cannot add. When that comes about, it will add some qualifier for the second "p".

The qualifiers would be done if the language of the "original" text is not tamizh (so when transliterating you give the text and tell it what original language it is).

Does this make sense?

Arun

arunk · Post by **arunk** » 28 Nov 2006, 02:21

drshrikaanth wrote:In tamizh, "jha" is represented as ஜ்ஹ "j" followed by "ha" while in all other cases, there is no difference in alpaprANa and mahAprANa (Which is as things should be). Is this how it is meant to be? I cannot see why.

Just an oversight. I am sure there are others like this.

In your example(bhajarE), Subha is wrong in the kannaDa text. It is Subha in my scheme but in your scheme(at leats on that page, it should be shubha).

I had a doubt on this. There are 3 sa's in kannad, sanskrit. I know the softest one is usually "s" and the one on the other end (like shAntam) is "sh" (or Sh if we want phonetic). I was using "S" for the middle.

Is the word here Shuba (sha as in shAntam) or Subha (the "S" being the middle one).

Arun

arunk · Post by **arunk** » 28 Nov 2006, 02:28

Which of the following?

for visarga, should i treat 'H' as a visarga
1. in the middle of a word and ONLY if it is followed by 'k' and 'p'
2. in the middle of a word and ONLY if followed by any consonant
3. ALWAYS (no matter wherever it is)

I am leaning towards #2 unless #1 would suffice. I prefer #3 least because a mistaken "maHAlakshmi" would come out wrong in all languages.

Arun

drshrikaanth · Post by **drshrikaanth** » 28 Nov 2006, 02:34

arunk wrote:
drshrikaanth wrote:1- pra, kra etc are not shown correctly as already pointed
I am sorry. Since i dont know these other languages well - i need a very basic explanation.

There is no better explanation. It shows up completely screwed up like this ಕೃಅ (kra) where as it should be like this- ಕ್ರ

2ங is "ng" (i.e. phonetic).

ஞ is "gn" (or "gny") - again I went with phonetic although it is weaker here.

This seriously needs rectification for words like bhgana like I have already pointed out.

I know for tamizh, you just enter vAngmaya. Whether it does correctly for kannada now, i dont know. If it doesnt, let me know how it should render and i will fix the kannada specific logic.

It does not show up right. It should be ವಾಙ್ಮಯ but actually shows up as vAMgmaya ವಾಂಗ್ಮಯ

Pardon my gross ignorance but what is the bindu? The logic for M/m was for anuswara - obviously needs adjustment. I am in many cases being lenient on upper/lower case - perhaps i shouldnt.

bindu is a circle which is the technical name for anuswAra in writing.

Yes. I put support for 'hr' (as in hrdaya) but didnt do all. By the way would it matter between krpa vs kRpa?

krpa and kRpa differentiation is important as classical kannada does have the vallinam R. That apart, krpa i also screwed up. Also words like arpaNe/arpisu will be screwed up if you use "r" to represent the vowel. Perhaps you could go for "R." as kR.pa (R followed by a dot) for the vowel. You will have to find a way for tamizh to ignore or transliterate it correctly.

Yes tamizh "na" is still a problem. I dont know how to deal with it yet. This may be one case where language specific artifiact is unavoidable.

I was thinking of some thing like (a weak proposal):
tiruT~nAmam, where ... is language specific, and first letter inside identifies language. Language specific interpreters will skip all of ... if the first letter doesnt match them. So this can translate to tiru~nAmam in tamizh and tirunAmam in kannada.

The trouble is the extra characters are pretty intrusive . I am open to other suggestions. If there are precise rules in tamizh for the two "na"s then we can incorporate that into the "behind the scenes" logic. But if there arent we are stuck with something like this.

Arun. i think numerical symbols are less intrusive and easier to remeber thatn ~ etc. I think n2 for one of the nagarams(nakAra) will conveniently solve the problem. You could write the script for kannaDa and other languages such that n and n2 are not differentiated but recognised as just one nakAra.

drshrikaanth · Post by **drshrikaanth** » 28 Nov 2006, 02:39

arunk wrote:[ I had a doubt on this. There are 3 sa's in kannad, sanskrit. I know the softest one is usually "S" and the one on the other end (like shAntam) is "sh" (or Sh if we want phonetic). I was using "S" for the middle.

Is the word here Shuba (sha as in shAntam) or Subha (the "S" being the middle one).

Arun

I use S, Sh and s for the 3 letters chronologically as they appear in the alphabet. "s" being the softest, using the lower case (sound as in the english word, survey, serve)

S and S are closely related with the later being a more stressed version of the former involving more effort. So S and Sh for these. For exameple SyAmA SAstri and dIkShitar will be the phonetic way of writing these.

drshrikaanth · Post by **drshrikaanth** » 28 Nov 2006, 02:43

arunk wrote:Which of the following?

for visarga, should i treat 'H' as a visarga
1. in the middle of a word and ONLY if it is followed by 'k' and 'p'
2. in the middle of a word and ONLY if followed by any consonant
3. ALWAYS (no matter wherever it is)

I am leaning towards #2 unless #1 would suffice. I prefer #3 least because a mistaken "maHAlakshmi" would come out wrong in all languages.

Arun

Go for 3. Yes you often have the visarga at the end of words without being followed by any consonant. Take dikShitar's gauLa kRti- SrImahAgaNaptiravatu as an example- The anupallavi ends in SivAtmajaH before taking the pallavi.

drshrikaanth · Post by **drshrikaanth** » 28 Nov 2006, 02:49

After having systematically tried various combinations (to the best of my knwoledge), Iam now trying random combinations. "strI" is not showing up correctly in kannaDa. On further testing. it appears that all conjuncts involving more than two consonants are a problem. The vowel is showing up separately. The only exception to this is those involving kSh as in lakShya. They are showing alright.

arunk · Post by **arunk** » 28 Nov 2006, 02:53

drshrikaanth wrote:There is no better explanation. It shows up completely screwed up like this ಕೃಅ (kra) where as it should be like this- ಕ್ರ

I am not sure this could be a result of flawed logic for detecting the "ru" vowel (hrdaya etc.)? When can that vowel occur? (i.e the one in krpa, hrdaya, mrdangam etc.)

drshrikaanth wrote:
ஞ is "gn" (or "gny") - again I went with phonetic although it is weaker here.
This seriously needs rectification for words like bhgana like I have already pointed out.

Agreed. I think i will make it gnya or is that not unambiguous always?

drshrikaanth wrote:
I know for tamizh, you just enter vAngmaya. Whether it does correctly for kannada now, i dont know. If it doesnt, let me know how it should render and i will fix the kannada specific logic.
It does not show up right. It should be ವಾಙ್ಮಯ but actually shows up as vAMgmaya ವಾಂಗ್ಮಯ

This could be trouble

. So something like sangam would be saMgam but vAngmaya would not be? The rules for anuswara are not deterministic or am i mistaken?

drshrikaanth wrote:krpa and kRpa differentiation is important as classical kannada does have the vallinam R. That apart, krpa i also screwed up. Also words like arpaNe/arpisu will be screwed up if you use "r" to represent the vowel. Perhaps you could go for "R." as kR.pa (R followed by a dot) for the vowel. You will have to find a way for tamizh to ignore or transliterate it correctly.

This is a possible. But for now it is 'r' following a consonant that is taken as a vowel. But maybe my initial assumptions about it is wrong - if it is getting krpa wrong

. I will check with baraha and see where my logic went of course

Arun. i think numerical symbols are less intrusive and easier to remeber thatn ~ etc. I think n2 for one of the nagarams(nakAra) will conveniently solve the problem. You could write the script for kannaDa and other languages such that n and n2 are not differentiated but recognised as just one nakAra.

Yes but we have to be sure that two languages dont have different consonants (so n2 means one thing in tamizh, but say something else in malayalam). But then if we can enumerate all cases, we can arrive at a numbering scheme that avoids collisions.

Arun

arunk · Post by **arunk** » 28 Nov 2006, 03:13

drs,

what are the contexts which will use ಋ and also ೠ? I am sure my assumptions (they were half-assed to begin with) were wrong.

I am adding visarga support. I think they make sense even in tamizh (for non-tamizh krithis) as the visarga character is ஃ. It does play the same role as in அஃது (right? man! even my tamizh is rusty

I am not sure I am the right person for this

)

Arun

drshrikaanth · Post by **drshrikaanth** » 28 Nov 2006, 03:21

arunk wrote:I am not sure this could be a result of flawed logic for detecting the "ru" vowel (hrdaya etc.)? When can that vowel occur? (i.e the one in krpa, hrdaya, mrdangam etc.)

The vowel is part of the kAguNita series of each consonant. It occurs in combination with any and every consonant. kRpa, gRdhra, ghRNa, tRpti, dhRti and so on.

drshrikaanth wrote:
ஞ is "gn" (or "gny") - again I went with phonetic although it is weaker here.
This seriously needs rectification for words like bhgana like I have already pointed out.
Agreed. I think i will make it gnya or is that not unambiguous always?

No. gnya wont do. For e.g agnystra. The consonant will have to have a unique representation.

This could be trouble . So something like sangam would be saMgam but vAngmaya would not be? The rules for anuswara are not deterministic or am i mistaken?

No you are getting it wrong. The pronunciation is different. No ambiguity here. You just have to know the spellings thats all. As long as you write the script right for your software, the headche of getting the spelling right lies with the user.

Yes but we have to be sure that two languages dont have different consonants (so n2 means one thing in tamizh, but say something else in malayalam). But then if we can enumerate all cases, we can arrive at a numbering scheme that avoids collisions.

Arun

I dont think there will be a problem between tamizh and malayALam. Both these languages have 2 nakAras and their pattern of occurrences also should be the same.
If someone knows otherwise, please quote examples.

ramakriya · Post by **ramakriya** » 28 Nov 2006, 03:22

arunk wrote:drs,

what are the contexts which will use ಋ and also ೠ? I am sure my assumptions (they were half-assed to begin with) were wrong.

I am adding visarga support. I think they make sense even in tamizh (for non-tamizh krithis) as the visarga character is ஃ. It does play the same role as in அஃது (right? man! even my tamizh is rusty I am not sure I am the right person for this )

Arun

ಋ and also ೠ can theoritically occur anywhere - although ೠ is extremely rare - and actually is out of the official kannaDa alphabet for some years!

There are few words which begin with ಋ - It can also occur in a cluster preceded and followed by almost any consonent.

BTW, I saw that krSNa transliterates correctly as ಕೃಷ್ಣ in KannaDa but incorrectly as க்ர்ஸ்ண in tamizh.

-Ramakriya

drshrikaanth · Post by **drshrikaanth** » 28 Nov 2006, 03:31

drshrikaanth wrote:I dont think there will be a problem between tamizh and malayALam. Both these languages have 2 nakAras and their pattern of occurrences also should be the same. If someone knows otherwise, please quote examples.

Hey. Hang on minute! Now does malayalam actually have e nakAras at all? Its alphabet follows the sanskrit pattern which has no place for 2 nakAras. In any case, this would not affect your script for the programme.