Single transliteration scheme for all CM languages?

drshrikaanth · Post by **drshrikaanth** » 21 Dec 2006, 03:20

In both telugu and kannaDa, it is tirupati only, not tiruppati. Dont just assume that it is the same in other indic languages. In these two cases it certainly is Not.

jayaram · Post by **jayaram** » 21 Dec 2006, 03:30

Thanks for the correction. So it's just Sanskrit, Malayalam & Tamil so far. Hindi uses the Devanagari script and may be similar to Sanskrit on this. And possibly the related languages like Gujarati, Marathi etc.

Well, this is all very interesting, but we have steered away from the original question raised by Shri Govindan. The answer to that remains the same: only Tamil uses the anticipatory consonant in split words scenario.

ramakriya · Post by **ramakriya** » 21 Dec 2006, 03:35

AFAIK, tirupati would be written as tirupati in dEvanAgari too (covers both hindi and samskrta)

arunk · Post by **arunk** » 21 Dec 2006, 03:42

is tirupati a combined word? If not your example is not correct. We are looking for a case where extra consonant stop is generated when combining two separate words

IMHO, the use of anticipatory consonant in tamizh is possibly overused/misused/abused

:). In the case of our example, எப்படிப் பாடினரோ, one could probably get away without the ப் when written separately. But the extra "p" in practice plays more than a "idiosyncracy of the script" role. People will emphasize it a tad more.

But may I dare suggest that it be used in our case (i.e. cm transliteraton) only when within a single word (which could be a combined word as in say பட்டப்பகல்). This probably wont work always

Arun

vgvindan · Post by **vgvindan** » 21 Dec 2006, 10:54

jayaram,

Apparently, Unicode hasn't captured all the Malayalam alphabets

I have gone thru the document. The implementation is thru zwj (zero width joiner). You may get ര്‍ ന്‍ witth combination j+d+(Shift+Ctrl+1) for 'r' and v+d+(Shift+Ctrl+1) for 'n'‍ . Let me try to incorporate the same in my program.
ന്‍ ണ്‍ ല്‍ ര്‍ ള്‍ ക്‍ - this has not been implemented for 'k'
n N l r L k

jayaram · Post by **jayaram** » 21 Dec 2006, 15:45

ന്‍ ണ്‍ ല്‍ ര്‍ ള്‍ ക്‍ - this has not been implemented for 'k'
n N l r L k

This 'k' is rarely used in Malayalam, so we should be ok on this one.

jayaram · Post by **jayaram** » 21 Dec 2006, 15:52

We are looking for a case where extra consonant stop is generated when combining two separate words

This example may be a close one: jagat + janani = jagajjani > the 't' in the first word is not really an anticipatory consonant like in Tamil, but this one is close to what we are looking for...

arunk · Post by **arunk** » 21 Dec 2006, 20:29

vgvindan,

the use of ZWJ is a "hack" as that character is "optional" for the rendering engine (per unicode standard) and is not supposed to convey any mandatory feature. I think they screwed up in the case of malayalam, and used this for cillAkshara as an "interim" hack. But it should still work for most fonts as of now.

But with most (?) fonts, you can generate the cillaksara without ZWJ by using the proposed cillakshara slots ( 0x0D7A-0x0D7E). Also, what I have observed is that certain fonts even without ZWJ or the explicit cillakshara slot, will treat a pure consonant at the end of a word as a cillAkshara (e.g. regular_na + 0x0DCD => cillakshara glyph for na).

Arun

vgvindan · Post by **vgvindan** » 21 Dec 2006, 20:52

പല്‍ലവി -
എപ്പഡി പാഡിനരോ അഡിയാര്‍
അപ്പഡി പാഡ നാന്‍ ആസൈ കൊണ്‍ഡേന്‍ ശിവനേ (എപ്പഡി)
അനു-പല്‍ലവി -
അപ്പരും സുന്‍ദരരും ആരുഡൈ പിള്‍ളൈയും
അരുള്‍ മണിവാസഗരും പൊരുളുണര്‍ന്‍ദു ഉന്‍നൈയേ (എപ്പഡി)
ചരണം -
ഗുരുമണി ശങകരരും അരുമൈ തായുമാനാരും
അരുണഗിരി നാദരും അരുട്ചോദി വള്‍ളലും
കരുണൈ കഡല്‍ പെരുഗി കാദലിനാലുരുഗി
കനിത്തമിഴ് സൊല്‍ലിനാല്‍ ഇനിദുനൈ അനുദിനം (എപ്പഡി

jayaram,
Is the implementation of zwj for k, N, n, r, l L is correct or is there problem elsewhere?

arun,
I checked Arial Unicode MS - The slot 0D7A - 0D7F is empty. Therefore, we cannot directly send these characters, but use only zwj route.
In any case, please check the implementation above.

jayaram · Post by **jayaram** » 22 Dec 2006, 06:35

This is better, but now it has a different kind of problem...

Words that end in chillaksaram are rectified now, so 'kondEn', 'adiyAR' etc. are correct. However, where the same sounds appear within a word, they are incorrect. So for e.g. in 'piLLaiyum', 'unnaiyE' etc. now have the chillaksaram inserted (before the 'La' and 'na' resply in these examples).
പിള്‍ളൈയും - piLLaiyum
ഉന്‍നൈയേ - unnaiyE

Which is not the way it is represented in Malayalam. Perhaps you need to add the rule that the chillaksaram is used only at the end of the word, but when the sound appears within a word, leave it as before with chandrakkalai.

(However, that's not the end of the story, as explained in the document I cited in article #291 in this thread. But we will leave these as exceptions and ignore them for now.)

arunk · Post by **arunk** » 22 Dec 2006, 21:08

jayaram,

i forget but is it that the cilllakshara MAY occur in the middle but it ALWAYS occurs in the end? If so, i think your solution should handle most common, and even for the exceptions one could force it if necessary (not in vgvindan's system maybe but in mine where input is from a engl. transl. scheme).

Is there a case where the pure-consonant is at the end of a word but does not appear as a cillakshara? If so, that would add more cases to the exceptions which are perhaps not as easily treatable.

Arun

vgvindan · Post by **vgvindan** » 23 Dec 2006, 11:41

jayaram,
പല്ലവി -
എപ്പഡി പാഡിനരോ അഡിയാര്‍
അപ്പഡി പാഡ നാന്‍ ആസൈ കൊണ്ഡേന്‍ ശിവനേ (എപ്പഡി)
അനു-പല്ലവി -
അപ്പരും സുന്ദരരും ആരുഡൈ പിള്ളൈയും
അരുള്‍ മണിവാസഗരും പൊരുളുണര്ന്ദു ഉന്നൈയേ (എപ്പഡി)
ചരണം -
ഗുരുമണി ശങ്കരരും അരുമൈ തായുമാനാരും
അരുണഗിരി നാദരും അരുട്ചോദി വള്ളലും
കരുണൈ കഡല്‍ പെരുഗി കാദലിനാലുരുഗി
കനിത്തമിഴ് സൊല്ലിനാല്‍ ഇനിദുനൈ അനുദിനം (എപ്പഡി)

The chillakshara are now appearing only at the end of the word.

jayaram · Post by **jayaram** » 23 Dec 2006, 23:59

vgvindan - this time it's mostly on target! That is, for this bit of transliterated text. There is just one word which looks a bit strange: അരുട്ചോദി - this reads as 'arutchodi'. I thought it was 'aruljoti' in Tamil? If yes, then this word has not been mapped correctly.

jayaram · Post by **jayaram** » 24 Dec 2006, 00:10

arunk wrote:jayaram,

i forget but is it that the cilllakshara MAY occur in the middle but it ALWAYS occurs in the end? If so, i think your solution should handle most common, and even for the exceptions one could force it if necessary (not in vgvindan's system maybe but in mine where input is from a engl. transl. scheme).

Is there a case where the pure-consonant is at the end of a word but does not appear as a cillakshara? If so, that would add more cases to the exceptions which are perhaps not as easily treatable.

Well, the chandrakalai can occur at the end of a word, e.g. in 'vaNdu' (for bee), the last sound is somewhere between 'u' and 'a' - the chandrakalai would be used here. But I guess this is not what you were referring to by 'pure consonant' above? In majority of cases, one can say that the chillaksaram occurs at the end.

And the chillaksara can occur within a word also, e.g. if 'nanri' (thanks) is mapped 'as-is' from Tamil to Malayalam, it may have to be written with a chillaksaram for the 'n' sound in the middle. (This is similar to the 'Henry' example given in the cited article I referred to earlier.) But I suppose such instances would be rare for CM songs?

Hope that's clear.

vgvindan · Post by **vgvindan** » 24 Dec 2006, 00:41

jayaram,
அருட்சோதி (अरुट्चोदि) is correct - not அருள்சோதி or அருள் ஜோதி. Though the constituent words are 'arul' and 'jyoti', the Tamilised form is as given. Sri Ramalinga Adigal, a great Saiva Saint is called 'அருட்சோதி வள்ளல்' or popularly 'வள்ளலார்'

arunk · Post by **arunk** » 27 Dec 2006, 00:59

hi,

i am sort of back on this and testing/fixing the sanskrit support. As expected the currently posted version is riddled with bugs.

But I have a question:

Take the word Sankara as in a sanskrit kriti or perhaps more appropriately NOT in a tamizh kriti. I see t hat the "nka" is really "ng" (or #n in our scheme) + ka. Is it okay if we require it to be specified as Sa#nkara and specifying it as Sankara would come out wrong? If we take that the person transl. a sanskrit krithi knows sanskrit, then specifying as Sa#nkara should be ok.

Or should we take the "nk" combination to always imply "#nk", so that in the transl. input you can specify Sankara (and hence perhaps more readable)? Is a n (न्) followed by ka (क) also valid in sanskrit? Or is it always ङ् + क?

Note that for kannada and telugu, the nka AND the #nka automatically makes the "n" to a bindu/anuswara. So both Sankara and Sa#nkara come out the same.

Also note that coming from a tamizh krithi phonetically it is sangara as tamizh does not have "nka" (or #nka) combination. The tamizh logic currently morphs nka/#nka to nga. Again Sankara, Sa#nkara, sankara, sa#nkara, sangara all come out same (alhough later when we add qualifiers it would be different).

If one were to go by phonetics, for a tamizh krithi , one should specify it as sangara. It would of course look "very odd" in other languages, particularly for this very well-known word. You can also specify it as sankara and get the same output in tamizh, and also better output in kannada and telugu. But then the pronounciation implied by the kannada/telugu script would now be different from tamizh script (albeit arguably more correct for the word in question

).

If we want the transl. scheme to represent the phonetic sounds of the actual language of the krithi as much as possible, then it must be "sangara" if in a tamizh krithi. And i presume still Sa#nkara in sanskrit? Here, I am presuming there must be subtle differences in pronounciation of Sa#nkara vs. Sankara in sanskrit?.

Thoughts?

Arun

jayaram · Post by **jayaram** » 27 Dec 2006, 06:01

'Sankara' in Sanskrit is normally written using ङ् + क - it's pronounced with a 'ka' sound, not 'ga'.
Malayalam and Hindi use similar representation. I suspect Tamil is also same, but lazy usage must have led to the 'ga' sound.

arunk · Post by **arunk** » 27 Dec 2006, 07:26

jayaram wrote:'Sankara' in Sanskrit is normally written using ङ् + क - it's pronounced with a 'ka' sound, not 'ga'.

Yes. I wasnt implying otherwise.

jayaram wrote:I suspect Tamil is also same, but lazy usage must have led to the 'ga' sound.

Nope. How it got to be so is a different issue, but tamizh does not use nka combination. Examples: ilangai, pangajam, sangu etc. This actually is not unlike mugam for mukha. The "ka" hard sound in the middle of the word only in certain contexts (will be preced by mei then). For some reason, tamizh refused to accomodate "new combinations" found mainly in words imported forcing them to morph to closest existing combination. It is perhaps a bit unfortunate as these words arent that rare atleast in later times.

My point again is common words have different pronounciation in different languages - the disparity being widest between tamizh and other languages., and the pronounciation of the source language of a krithi should be preserved as much as possible. Looking at it linguistically and arguing it shouldnt have been that way in a language is pointless. The language has evolved in a certain way and it is the way it is.

The case of Sankara, pankajam etc are probably perhaps harder to reconcile, but singing "pAl vaDiyum mukham" would be considered (ironically

) bad pronounciation in tamizh. This would be true for many words which are sanskrit imports, and which have morphed significantly on inclusion to tamizh.

Arun

vgvindan · Post by **vgvindan** » 27 Dec 2006, 10:58

The purpose of transliteration is to reproduce the sound patterns as existing in the language of the Kriti so that the musicians would have an understanding how the kriti is to be sung - most approximately - notwithstanding the accent peculiar to the musician's language. Within the given constraints of the languages, one should attempt to do justice to the original kriti.
For example, the Tamil and Malayalam sound ழ does not exist in Telugu and Kannada and the most approximate sound is L; The famous debate of ख ग घ between Tamil and other languages is well documented.
Further, as far Tamil words are concered, the tatsamam and tatbhavam words are a category by themselves. Particularly, the tatsamam words like 'guru', 'Sankara' etc though may be written in Tamil alphabets, they do carry the sounds of Sanskrit - that is why they are called 'tatsamam'. Therefore, reproducing the sounds appropriate to such words should be the aim and not go by the sound patterns of Tamil words per se. There is one more angle to it - for example, the word 'pankajam' has been duly Tamilised as 'pangayam'. Therefore, while transliterating the Tamil word 'pangayam', one should not convert it to 'pankajam', because the sound pattern would be lost. Similarly, 'pankajam' from Telugu, Kannada and Malayalam should be translated to Tamil as 'pankajam' only. Where required, the notations may have to be resorted.
Therefore, IMHO, the aim of reproducing the sound patterns only is the consideration. To that extent, if notation becomes essential, we may have to adopt it.

jayaram · Post by **jayaram** » 28 Dec 2006, 03:39

Govindan - I am in agreement with the key point you make, i.e. we should capture the source language sounds to the extent possible when transliterating. What this leaves us with are the notational indicators to choose to make this possible. Do we have a canonical list of these already in place?

arunk · Post by **arunk** » 28 Dec 2006, 04:46

vgvindan wrote:Further, as far Tamil words are concered, the tatsamam and tatbhavam words are a category by themselves. Particularly, the tatsamam words like 'guru', 'Sankara' etc though may be written in Tamil alphabets, they do carry the sounds of Sanskrit - that is why they are called 'tatsamam'.

They may have supposed to have carried the original sounds in tamizh, but over time they have morphed (or still morphing) in practice. that I am not sure one can say that the sanskrit sound is the correct sound even after import. This of course is due to the limitation of the tamizh script to carry these sounds - a problem we dont find in other scripts like telugu, kannada and malayalam which also have such imported words from sanskrit. If the script cannot carry the sound, then it is sort of a loosing battle against morphing.

For example, take radam, sangu etc. This "alternate pronounciation" is pretty much the norm and perfectly acceptable like mugam for mukha. Even sankara is pronounced by many as sangara (but then also as sankara by as many if not more) - this one is sort of the in middle of morphing

. Now one could argue that this is just plain misprounciation sort of like some tamilians saying paLam for pazham

- maybe so but one could argue otherwise too. As I mentioned in some earlier post, the "#n" + "ka" combination will naturally lead many tamilians to #nga sound. It is sort of unavoidable because of the limitation of the script and the morph happens not unlike rata => radam, mukha => mugam etc.

Arun

arunk · Post by **arunk** » 28 Dec 2006, 04:47

jayaram wrote:Govindan - I am in agreement with the key point you make, i.e. we should capture the source language sounds to the extent possible when transliterating. What this leaves us with are the notational indicators to choose to make this possible. Do we have a canonical list of these already in place?

Hey! This is the drumbeat I have been drumming all along! It didnt come out that way?

Arun

vgvindan · Post by **vgvindan** » 28 Dec 2006, 12:11

arun,
Script Transliteration has a limited scope. We have to follow the literary traditions of each language and not the accent variation. As far Tamil is concerned, I am positive, that there is a generally acceptable sound patterns in literary works.
While it is possible to define all possible permutations and combinations of sound patterns based on rules of spelling, the variations generally prevalent - which may vary from place to place within any given language, cannot be catered for.
Therefore, instead of trying to achieve a 100% automated solution of Transliteration, I would recommend an intermediate human editing. This is what I tried to do when transliterating the Tamil Kriti 'eppadi pADinarO' in order to cater for 'tatsamam' words.
No matter how elaborate the scheme is, inter-language handling is a major hurdle. For example, I was wondering whether the Tamil word பதில் is a Tamil word at all because it starts with a 'ba' sound. Though this word is extensively used in literary works, the sound pattern indicates it to be a non-Tamil word. This word could be easily confused with the word பதிலம் which starts with a 'pa' sound.

jayaram,
I am not sure whether there is any canonical list of notations. I have been using what could be most plausible and readily recognisable notations in transliterating Thyagaraja Kritis into Tamil.
Once a system is created in a most recognisable way and a proper legend is given, people will understand them.

arunk · Post by **arunk** » 28 Dec 2006, 23:05

vgvindan wrote:For example, I was wondering whether the Tamil word பதில் is a Tamil word at all because it starts with a 'ba' sound. Though this word is extensively used in literary works

Funny - this same thought had occured to me a while ago

. I do not think it is a native tamil word but dont know the etymology.

We have to follow the literary traditions of each language and not the accent variation.

I agree. But it is just that it is not as clear cut always, and is quite dicey with tamizh whose script cannot accomodate the sounds. I dare to propose that is the main reason you have even unambigiously accepted morphs like sangu, radam etc.

As far Tamil is concerned, I am positive, that there is a generally acceptable sound patterns in literary works.

Is there grammatical source for these? If not, although i see your point, "generally acceptable" isnt the same as unambiguous and is suject to argument.

Oh well! we dont have to break heads about this any more. My point was that while in some cases it is clear if the sounds have changed (like sangu, bAvam, badil), and in some the sounds have not (pankajam perhaps fall here), in other cases it is somewhere in between. But it really does not matter for us. If the transl. scheme is phoentically umambigous, and we have the ability to add qualifiers we are ok. You specify it as sankara and it denotes only one pronounciation. You denote the same as sangara, it unambigously indicates a different pronounciation. This is orthogonal to the issue as to which pronounciation is right vs wrong

BTW, i think intermediate editing should mostly be avoidable, if the input whether it is engl. transl. scheme like what I am thinking or devanagiri (like you are thinking) can represent the different phonemes unambiguously AND again you have the ability to qualify the letters in the target script and thus eliminate ambiguity there. For example, for tamizh there is no need to qualify a "pa" when at the beginning except for the words like badil, bAvam i.e. in words in input that begin with ba (besides that, course you need qualifiers for pha, bha etc. too). So eppaDi pAdinarO vs the (wrong) eppaDi bAdinarO will appear differently when rendered in tamizh too because the "pa" in the second case will have a qualifier to indicate it is ba. Similarly the word vAtApi, would have a (different) qualifier for "pi" to indicate it is a "pi" sound even though it is not preceded by the mei. Now one could argue that this sort of sucks that even for tamizh words you would end up with qualifiers as in the first case, but the fact of the matter is that the tamizh script is sorely lacking behind the language it represents!! Short of extending the script, qualifiers is just the way to go, or we just leave it for people to figure it out from the context (which tamilians are used to

).

I am enhancing my test page to indicate these qualifiers (in the form of of super-scripted numbers). When it is ready, perhaps this will be clearer (and also i can test this theory of mine

).

Arun

vgvindan · Post by **vgvindan** » 28 Dec 2006, 23:24

arun,
As I suspected, பதில் comes Arabic root. Please take a look -

பதல், (p. 723) [ ptl, ] s. (Arab.) Lieu, stead. See பதில், வதில்.
பதில் (p. 724) [ ptil ] --வதில், (Arab.) A particle- instead of, in place of, or in lieu of, equi valent to; used with the dative and declin ed only in that case. அவனுக்குப்பதிலாகஇவன்சாமங்காக்கிறான். He keeps watch in place of the other.

http://dsal.uchicago.edu/cgi-bin/romadi ... le=winslow

drshrikaanth · Post by **drshrikaanth** » 28 Dec 2006, 23:37

arunk wrote:
vgvindan wrote:For example, I was wondering whether the Tamil word பதில் is a Tamil word at all because it starts with a 'ba' sound. Though this word is extensively used in literary works
Funny - this same thought had occured to me a while ago . I do not think it is a native tamil word but dont know the etymology.

badil, as I can see, is derived from badal (Persian?) meaning exchange/return/in lieu of etc. As you can see, the word is also used as such in Hindi. "badil" is used in the same way as mARu (mATRu). You can clearly see the common thinking behind these words.So badil is what one saws in exchange for/in reply to someone else's words. This is exactly like saying "prati ADuvudu" in kannaDa. Also recall here tyAgarAja's famous song "mAru balkaka unnAvEmi" in SrIranjani.

drshrikaanth · Post by **drshrikaanth** » 28 Dec 2006, 23:38

Our posts crossed VGV

vgvindan · Post by **vgvindan** » 29 Dec 2006, 14:08

Similarly the word vAtApi, would have a (different) qualifier for "pi" to indicate it is a "pi" sound even though it is not preceded by the mei.

What I understand from this is - the version वातापि गणपतिं भजे when transliterated to Tamil would be - 'வாதா1பி1 க3ணப1தி3ம் ப2ஜே' - (superscript in place of running figures). If this is so, then, it seems we assume that the தா and பி in வாதாபி is supposed sound as 'da' and 'bi' in Tamil, therefore, in order to distinguish, we use notation to indicate the correct sound.
Please correct me if I am wrong.

arunk · Post by **arunk** » 29 Dec 2006, 21:10

vgvindan wrote:
Similarly the word vAtApi, would have a (different) qualifier for "pi" to indicate it is a "pi" sound even though it is not preceded by the mei.
What I understand from this is - the version वातापि गणपतिं भजे when transliterated to Tamil would be - 'வாதா1பி1 க3ணப1தி3ம் ப2ஜே' - (superscript in place of running figures). If this is so, then, it seems we assume that the தா and பி in வாதாபி is supposed sound as 'da' and 'bi' in Tamil, therefore, in order to distinguish, we use notation to indicate the correct sound.
Please correct me if I am wrong.

Yes. Except that I dont think we need the qualifier 3 for the க in க3 as the க alone implies the ga sound in this particular phrase. So it would be கணப1தி3ம்

But an alternate interpretation/scheme is make க without qualifiers always signify "ka", and use qualifiers always when it is used for ga, kha and gha. I prefer the earlier one as I think it could avoid having too many qualifiers.

Arun

arunk · Post by **arunk** » 29 Dec 2006, 21:10

oh nuts! I will edit this again from my PC later (got to step out now). I forgot that the non-english characters dont work on my mac

Arun

jayaram · Post by **jayaram** » 29 Dec 2006, 21:16

Huh? Shouldn't 'tatsamam' rule dictate that गणपतिं when transliterated to Tamil uses the 'ta' sound, not 'da' sound - hence shouldn't that be க3ணப1தி1ம் - not க3ணப1தி3ம் as you have written above?

drshrikaanth · Post by **drshrikaanth** » 29 Dec 2006, 21:28

In case subscripts/superscripts are used, the letters written as such without any qualifiers should ALWAYS be representing the same sound regardless of the position of the consonant. This should be by befault the first consonant of each pentad ie ka, ca, Ta, ta, pa. There is no need to qualify this sound. Therefore, the pronunciation of thie letters should be traits not states.

So there is no need to use a qualifying superscript/subscript of 1. It will be the letter itself and then the qualifiers 2, 3, 4 for the other 3 sounds.

vgvindan · Post by **vgvindan** » 29 Dec 2006, 22:13

IMHO, what drs has stated is very appropriate because the sound k, c, T, t, p which is natural to Tamil need not be notated. I would like to add that subscript 1 might be needed for श. Though CDAC by default gives ஷ as the default Tamil sound for श. IMHO, ஸ is more appropriate sound and this could be notated as ஸ1 - because श precedes स in the order. Similarly ऋषि कृषि might be rendered as ரு2ஷி and க்ரு2ஷி.
In the given example வாதாபி கணபதிம் பஜே - it would be fine if it is notated as வாதாபி க3ணபதிம் ப4ஜே.
One more point, I would like to make - when transliterating to Tamil from other languages, there is no need to go by the structure of Tamil words per se.
The reason is that - for example मर्कट following the rules of Tamil would be rendered as மர்க்கட with intervening க் added. Similar such issues will make it totally unwieldy to transliterate to Tamil.
Therefore, I would suggest that we keep the structure of original language. Accordingly मर्कट would be rendered as மர்கட only - here க taking the sound of k by default.

arunk · Post by **arunk** » 29 Dec 2006, 22:49

drshrikaanth wrote:In case subscripts/superscripts are used, the letters written as such without any qualifiers should ALWAYS be representing the same sound regardless of the position of the consonant. This should be by befault the first consonant of each pentad ie ka, ca, Ta, ta, pa. There is no need to qualify this sound. Therefore, the pronunciation of thie letters should be traits not states.

So there is no need to use a qualifying superscript/subscript of 1. It will be the letter itself and then the qualifiers 2, 3, 4 for the other 3 sounds.

I do think books do follow this and there is logic to this, but it will can introduce unnecessary qualifiers in more instances for tamizh words which I dont like. For example, you could have to qualify தங்கம் as தங்க3ம், also nAgam, pAsam, tAbam etc. etc. etc.

Since the target readers of the tamizh rendition are tamizh readers, i thought it may be better to introduce qualifiers only when the letter implies a different sound from what the natural language rules are. Otherwise it seems to introduce a bit too much of artificial constructs. There is no need to make the tamizh script to be like the others when the readers of the script are used to interpreting it contextually.

One could also do away with qualifiers if the original language is tamizh and thus avoid these artificial constructs (but then you have trouble with Sankara etc. where people can interpret it differently)

Arun

drshrikaanth · Post by **drshrikaanth** » 29 Dec 2006, 23:06

Arun
I fully agree with you. I was only talking about transliterating othr languages into tamizh. Not about writing tamizh kRtis in tamizh.

arunk · Post by **arunk** » 29 Dec 2006, 23:08

vgvindan wrote:IMHO, what drs has stated is very appropriate because the sound k, c, T, t, p which is natural to Tamil need not be notated. I would like to add that subscript 1 might be needed for श. Though CDAC by default gives ஷ as the default Tamil sound for श. IMHO, ஸ is more appropriate sound and this could be notated as ஸ1 - because श precedes स in the order. Similarly ऋषि कृषि might be rendered as ரு2ஷி and க்ரு2ஷி.

ஸ1 seems ok. Or you could have a qualifier to ச itself - that may make some words like Sakti come out better? ரு2 makes sense.

The reason is that - for example मर्कट following the rules of Tamil would be rendered as மர்க்கட with

Maybe not as the extra "ik" can affect the phonetics (albeit slightly). I think this is where the qualifier for "ka" sound can become helpful. It can be மர்க1ட to imply that க here isnt ga sound?

Arun

drshrikaanth · Post by **drshrikaanth** » 30 Dec 2006, 00:00

arunk wrote:Maybe not as the extra "ik" can affect the phonetics (albeit slightly). I think this is where the qualifier for "ka" sound can become helpful. It can be மர்க1ட to imply that க here isnt ga sound?

Will it be ga sound? I dont think so. ka when a part of a cojunct consonant with other non-nasal consonants, retains ka sound and not ga sound. Take veTkam, saratkAlam etc. likewise rk combination will be pronounced with ka sound and not g sound.

Even otherwise, I think it is redundant to use the 1 qualifier as that actuall makes the script of the language redundant. Also, i dont thik people will have any difficulty in getting a hang of the transliteration scheme and reading it right for other languages. If someone can recognise qualifiers, they sure can see what the unqualified letter will mean too.

arunk · Post by **arunk** » 30 Dec 2006, 00:09

drshrikaanth wrote:Will it be ga sound? I dont think so. ka when a part of a cojunct consonant with other non-nasal consonants, retains ka sound and not ga sound. Take veTkam, saratkAlam etc. likewise rk combination will be pronounced with ka sound and not g sound.

Yes you are right. I stand corrected.

Even otherwise, I think it is redundant to use the 1 qualifier as that actuall makes the script of the language redundant.

Hmm. I was specifically thinking about words like inta, kanka, (and of course Sankara) etc. where without the qualifier it is sort of ambigious and there will be a strong tendency to read it as "nga" . I think, you would need it only for non-tamizh words. In fact when I read cm books (of non tamizh krithis) in tamizh using qualifiers, i find this as the biggest source of confusion. IMO, using qualifier here does not make the script redundant since it is ambigious without it in such contexts.

I am not sure when transliterating from tamizh to tamizh, you will come across places where a qualifier for "ka" sound would be needed.

Arun

drshrikaanth · Post by **drshrikaanth** » 30 Dec 2006, 00:12

arunk wrote:I am not sure when transliterating from tamizh to tamizh, you will come across places where a qualifier for "ka" sound would be needed.

Arun. One more time you accuse me of asking you to use qualifiers for writing tamizh in tamizh, Iam going to SCREAM. I dint say that, so drop it

arunk · Post by **arunk** » 30 Dec 2006, 00:18

drshrikaanth wrote:
arunk wrote:Maybe not as the extra "ik" can affect the phonetics (albeit slightly). I think this is where the qualifier for "ka" sound can become helpful. It can be மர்க1ட to imply that க here isnt ga sound?
Will it be ga sound? I dont think so. ka when a part of a cojunct consonant with other non-nasal consonants, retains ka sound and not ga sound. Take veTkam, saratkAlam etc. likewise rk combination will be pronounced with ka sound and not g sound.

On thinking a bit more, I wonder why we write pArtasArati as pArttasArati, and pArkalAm as pArkkalAm? The extra "ik" here also does change the phonetics slightly though. An absence of "ik" there does seem to change pronounciation even more. That is, if you write pArtasArati as is, wouldnt it be read by most as pArdasAradi, and pArkalAm as pArgalAm? For example, consider the word mArgam used in bharatanAtyam in tamizh (although the word may be of non-tamizh origin). It is written as மார்கம் (another example sorgam for heaven). So here the absence of "ik" does change it to "ga". This example does indicate that மர்கட (without any qualifiers) would be read as margaDa and we should either write it as ம்ர்க்கட or மர்க1ட. My preference is later as the first one carries a teeny-weeny extra phonetic baggage.

Arun

arunk · Post by **arunk** » 30 Dec 2006, 00:19

drshrikaanth wrote:
arunk wrote:I am not sure when transliterating from tamizh to tamizh, you will come across places where a qualifier for "ka" sound would be needed.
Arun. One more time you accuse me of asking you to use qualifiers for writing tamizh in tamizh, Iam going to SCREAM. I dint say that, so drop it

. I wasnt accusing you but *I* want them qualifiers

. For words like badil, bAvam, Sankara, pankajam (even in tamizh krithis)

.

Arun

arunk · Post by **arunk** » 30 Dec 2006, 00:35

drs,

i think perhaps க takes the "ka" sound when follow "T" and "R" (but not "r"), besides of course "k" itself.

taRkAlam, poRkAlam but mArgam, sorgam and pArkkalAm.

I couldnt come up with other consonants which can cause the succeeding க to take ka sound. But i wouldnt be surprised if there are.

Arun

arunk · Post by **arunk** » 30 Dec 2006, 04:26

arunk wrote:
vgvindan wrote:the version वातापि गणपतिं भजे when transliterated to Tamil would be - 'வாதா1பி1 க3ணப1தி3ம் ப2ஜே'
Yes. Except that I dont think we need the qualifier 3 for the க in க3 as the க alone implies the ga sound in this particular phrase. So it would be கணப1தி3ம்

I was just reading back through the posts and I saw this. I dont know what I was thinking. Per "natural rules" the கண would be kaNa since the க is at start of the word and so we do need qualifiers to indicate "ga" sound. So my post above is wrong.

Also as jayaram points out it should indeed be தி1, if தி1 => ti, தி2 => thi, தி3 => di and தி4 => dhi. So it should be க3ணப1தி1ம்.

But per the other scheme proposed where pa, ta and ka etc. dont have qualifiers and so say you have தி1 => thi, தி2 => di and தி3 => dhi, then you would have க2ணபதிம். I have already indicated which one I prefer and why.

Arun

arunk · Post by **arunk** » 30 Dec 2006, 04:36

I also thought of using something other than numbers for superscripts. I am not crazy about this myself but I thought i will throw it in there anyway

Instead of தி1 => ti, தி2 => thi, தி3 => di and தி4 => dhi, you have

திt => ti, திh => thi, திd => di and திdh => dhi.

One advantage is the super-script itself is "more self-evident" (you dont have to remember which is 1? 2?). But then in case of "dh" you have to use two letters for super-script and that is pretty bad. Besides mixing english into tamizh seems more intrusive than numbers. Even if mixing english is ok, the two-letter super-script seems like a deal breaker.

Arun

jayaram · Post by **jayaram** » 30 Dec 2006, 05:59

Arun, may I suggest a slight improvement over your proposal...(this is as per DRS's observation to leave the basic consonant as is):

Use தி >> ti, திt >> thi, திd >> di & திh >> dhi

This avoids the 2-letter problem you face with your system. What do you think?

vgvindan · Post by **vgvindan** » 30 Dec 2006, 10:44

Introducing English letters for notation purposes may not be advisable - we should keep the auidence in mind.
Regarding ச or ஸ for represening श - it is a matter of choice - I would though prefer ஸ.
Regarding மர்கட and மார்கம், it would be ideal to leave மர்கட as it is with k as the default sound and add notation மார்க3ம் because the non-available sound g.
Similarly, पंकजं, पंख, पंग or पङ्ग, पंघ or पङ्घ it would be advisable to transliterate as பங்கஜம் பங்க2 பங்க3 பங்க4 respectively because Sanskrit - and other languages permit all the four consonants to be joined to ङ ञ ण न म whereas Tamil permits only one sound g, j, D, d, b respectively.
In case of ஜ it is a little confusing. For example பஞ்சம் a Tamil word automatically gives the ஜ sound. But in the Transliteration scheme, this also will need to be clarified. For example - मञ्जुळ as மஞ்சு3ள or மஞ்ஜுள in order to have confusion with मञ्चु a Telugu word meaning dew as மஞ்சு with default sound c.
Similarly regarding ज्ञानं अज्ञानं विज्ञानं प्रज्ञानं - ஞானம் அஞ்ஞானம் விஞ்ஞானம் ப்ரஞ்ஞானம் the extra ஞ் need to be inserted.
In case of प्रज्ञानं sometimes it is written as ப்ரக்ஞானம் which is not correct, though the sound from of प्रज्ञानं and ப்ரஞ்ஞானம் are a little different.

arunk · Post by **arunk** » 30 Dec 2006, 19:38

i really do not like to use no qualifiers for default sound as that leads to a lot of confusion and makes the notation go against the grain of natural rules of the language rules - your markata and margam example is pretty much the reverse of what the audience would be used to.

As you say - we should keep the audience in mind. This one doesnt.

Arun

arunk · Post by **arunk** » 30 Dec 2006, 19:40

jayaram wrote:This avoids the 2-letter problem you face with your system. What do you think?

The trouble is the qualifiers arent as self-evident anymore and so are only as good as 1,2,3,4. Besides i tend to agree with vgvindan that introducing english is perhaps a bit too much.

Arun

vgvindan · Post by **vgvindan** » 30 Dec 2006, 20:16

On a second thought, it occurs to me that in regard to श, ச cannot be used because it would have qualifiers for छ झ. That leaves only ஸ.
In regard to default sound having no qualifier, I would only say that by introducing a qualifier, we would be cluttering up because the default sound would be the largest used letters.

arunk · Post by **arunk** » 30 Dec 2006, 20:22

Do people agree that pArttasArati (with the extra mei i.e. t), carries a slightly different pronounciation form pArtasArati? I ask this now because originally I had proposed that the intervening mei (ith here) is something unique to tamizh (script) and hence need not be specified in the input and that the transl. engine will put it. This again is so that the word in the input as pArtasArati will come out correctly in all languages. But for markaTa I had argued that markkaTa would not be as ideal and that we should leave it as markaTa etc. etc. Contradictary

. I guess it was easier to hypothesize based on tatsavam/tatbhavam words like pArtasArati, rather than new words

. Now I am unsure as it seems to have caused a dilemma.

So what should we do? I am thinking if the input is a non-tamizh krithi, we should NOT introduce "mei's" to retain hard consonant sounds (and thus either use qualifiers etc. depending on what we agree there). That seems easy.

But the question gets harder if the input is a tamizh krithi. If the intervening mei also has a slight phonetic role, then atleast for tamizh words like paTTappagal, i think we want to retain the mei across to the other languages (?). Again this is because since intervening mei carries some phonetic value paTTapagal is different from paTTappagal.

But what about for common words like pArtasArati (again in a tamizh kriti)? They carry extra mei and thus extra (but arguably extraneous) phonetic baggage, but carrying the extra mei across to other languages seems bogus too (?)

The dilemma here is if marKaTa is rendered in tamizh without intervening mei, then pArtasArati would also be done so (but qualify the ta). That would be ok for non-tamizh krithi, but for a tamizh krithi it would be wrong. To fix it if the word in a tamizh krithi is input as pArttasArati to get the mei in tamizh rendition, then that mei becomes a problem going to other languages as explained above. Maybe that is something we live with?

Arun