Single Transliteration Scheme for all CM Languages - Part 2

vasya10 · Post by **vasya10** » 06 Feb 2007, 05:28

Arun,

One useful feature could be, and may be you already thought about it, is export the transliterated data as a pdf.

arunk · Post by **arunk** » 06 Feb 2007, 06:37

vasya,

this is doable now itself. All you need to do is get a pdf print driver which allows you to save what you would normally send to a printer as a PDF file (e.g google for pdf995). With this then from the Printable View, you just choose Print options on your browser, and instead of sending to your printer, choose the pdf printer.

Arun

arunk · Post by **arunk** » 07 Feb 2007, 01:25

i have tested with looking up a sanskrit word database for using anuswara, and it works. However, there is a significant problem: The input text (as in sAhitya) can have many words (that can be a potential match in the dictionary) combined into single words in english. Note also that when words are combined, they morph as per rules of language.

So unless language rules are applied (which is very difficult), it is impossible to reliably figure out which words in the input do correspond to words in dictionary (i.e. those that require anuswara in sanskrit).

For example, if sangIta comes as such, I can match against saMgIta (with some smart logic). I can even match sangItam (add m if word ends with a and try for a match), but what if the word is karnAtakasangItam in one word (or something else)? "sangIta" can occur anywhere in an input word. Now a solution could be match it anywhere in an input word, but I see an entry for aMsa - and does it mean amsa anywhere should match? . I am thinking not.

So while the dictionary would help, i may not help that much. Of course, i can introduce a feature, where use highlights some text and explicitly asks for a match in database - but that means only a user who knows sanskrit well will be able to provide the correct input that will translate to all languages

:(. I guess that is going to be our achilles heel.

We are so close to our solution, yet there seems to be an insurmountable barrier

.

Any suggestions?

Thanks
Arun

arunk · Post by **arunk** » 07 Feb 2007, 02:43

arunk wrote:but that means only a user who knows sanskrit well will be able to provide the correct input that will translate to all languages :(.

May be this isnt a big deal. If the input represents a sanskrit krithi, then it is not an unfair expectation for the user to be aware of where anuswara figures?

But if the krithi is non-sanskrit, and the user entering the krithi dont know sanskrit rules - how would it be if certain words (in a language other than sanskrit), that happen to be sanskrit based get rendered in sanskrit with no anuswara?

For example, if the word like sangItamu (as entered) is in a telugu krithi, but as rendered in sanskrit say doesn appear with anuswara - is that too bad?

Arun

vasya10 · Post by **vasya10** » 07 Feb 2007, 02:59

(May be im simplifying things a bit, because I didnt understand all the discussions)

For anusvara logic, isnt it enough just to follow the pANinI's rule "anusvArasya yayi parasavarNah" ? Or is the issue beyond that ?

vasya10 · Post by **vasya10** » 07 Feb 2007, 03:02

Just want to clarify what I meant -- if you just encode the 14 sutras of pANini into the database, you should be able to derive anusvara logic.

arunk · Post by **arunk** » 07 Feb 2007, 03:18

vasya,

yes but that is easier said than done

. It isnt worth it for the scale of our use.

Arun

arunk · Post by **arunk** » 07 Feb 2007, 21:57

Please let me know if this is ok.. Drs/ramakriya/jayaram - in particular i am going to bother you specifically

. Feedback from others are also very welcome

After racking my brains over this more, I have an alternative proposal which may be the best given our constraints.

For kannada and telugu, there are contexts which certain combinations ALWAYS use anuswara i.e. #n[kg], ~n[cj], n[td], N[TD], m[pb]. Note that for making the input easier to read, for the first two cases, the scheme allows you just n instead of #n, ~n, i.e. pankaja, panca is ok. Also, currently, you would simply use M instead of #n/~n/n/m in all these cases. But as i have noted many times, except in the last case where M represents m, it is not recommended to use this as it not as phonetic, and also can lead to misleading pronunciation for people who do not know the language. Besides, one of the aims of the scheme was to avoid script specific artifacts wherever possible, and this is definitely one place where it can be avoided for these 2 languages.

However, note that for kannada and telugu, there are contexts where certain combinations do NOT ALWAYS use anuswara. Example is mya, mSa etc. We decided here that user would need to explicitly the anuswara (raMya). I think for kannada and telugu, these contexts only have the anuswara implying "m" sound (and not #n/~n/N - right?)

I am thinking sanskrit should also follow the same rule but obviously in more contexts because of use of anuswara in the language. IN THE MIDDLE of a word (end of words - see below), whenever anuswara is required, it needs to e explicitly specified - else no anuswara would be rendered. Of course as per current scheme, this would mean saMgIta, saMtOsha etc. which again is not phonetic, and can mislead pronounciation for some people.

i think malayalam can follow same rule (but contexts where anuswara figures would be the least of the 4 languages).

A more phonetic explicit anuswara specifier for use inside words
But what if we adopt a different more phonetically fair specifier for anuswara in places it represents #n, ~n, n and N sound? For example, one that uses n/N but with a prefix. I propose the back-tick character ` - so you have sa`ngIta, sa`ntOsha. The advantage here is the explicit anuswara specification is still phonetically quite fair - sa`ngIta is much better than saMgIta. I find this a whole lot more desirable than M in such cases. But in contexts where anuswara represents the "m" sound (ahamkAra), we still use M as ahaMkAra. So we have 3 representations for explicit anuswara: `n, `N and M.

(note: we could choose a different character than backtick - only constraint being it should not be too "visible" and intrusive that it becomes an eyesore. We could also use it as a suffix - san`gIta as opposed to sa`ngIta - this may be better representation of the internal structure of the word?

anuswara at end of words for sanskrit
This is tricky in sanskrit as it depends on end of sentence etc. I can detect many cases in logic and apply but i dont think in a reliable way - which means a user that cares need to have control. So I am just going to have three options for sanskrit:
(a) always use anuswaras end of words (regardless of m/M)
(b) never use anuswaras at end of words (regardless of m/M)
(c) use anuswaras only when M is specified explicitly at end of words. This can allow a meticulous user to get the rendition to use anuswaras (at word-endings) in middle of sentences, and not at end of sentence - but its up to the user.

Conclusion:
I think all this basically puts the responsibility on the user to know when sanskrit requires anuswaras and when it doesnt. I think this is ok, the editor is not involved in "teaching how to write sanskrit"

Besides we were ok with that rule for "my" combinations in kannada and telugu. I dont know why I forgot that

Rules for specifying Anuswara
So based on this here are some concise rules i can think of:
(a) tamizh krithis: no need to specifify anuswara ever as it doesnt make sense for the language. When this gets transl. to kannada/telugu, anuswara would be used in middle of words for #n[kg], ~n[cj], n[td], N[TD], m[pb], and also when m is at end of word. When a tamizh krithi gets transl to sanskrit/malayalam, sanskrit-based words may not appear ideally, as they wont have anuswara. This may be ok as, while the word is sanskrit-based, one could argue it is still in the context of a tamizh krithi and thus non-sanskrit, and sanskrit rules for anuswara may not apply. Of course, a person who does care about sanskrit rendition, can introduce explicit anuswara specifiers even in tamizh krithi (e.g. sa`ngIta)
(b) kannada,telugu krithis:
(i) Should not explicitly specifiy anuswara in contexts where it represents ~n, #n, n, N, M (i.e. use panca/pa~nca, Sankara/Sa#nkara, pANDava, amba).
(ii) Should not explicitly specify anuswara for end of words as it always imply anuswara. Use "m" instead
(iii) Must specify in contexts which do not automatically imply anuswara - e.g. raMya.
So basically specify anuswara only when it is not automatically implied. Note again, that this means that when the krithi gets transl. to sanskrit/malayalam, sanskrit-words may not appear ideally. Depending on user's preference then explicit anuswara may be specified for (i) and (ii), but as `n, `N where it represents #n, ~n, n, N, and M ONLY when it represents m sound.
(c) sanskrit krithis: Must specify anuswaras but only where they occur. Again specify `n, `N where it represents #n, ~n, n, N, and M ONLY when it represents m sound. When a sanskrit krithi gets translated to kannada/telugu, it *may* force anuswaras in places which normally are not there? But I am not sure.
(b) malayalam krithis: Must specify anuswaras but only where they occur. I think anuswara would figure and hence need be specified only in cases where it represents "m" sound (like raMya)? If so, the editor may ignore anuswara specifier in places where it represents #n, ~n, n and N sound? (and use actual characters) - so sa#ngIta/sangIta/sa`ngIta would all be rendered as sa#ngIta.

Thanks
Arun

arunk · Post by **arunk** » 08 Feb 2007, 02:24

so nobody gives a hoot

??

Your silence will be conveniently interpreted as rousing approval

!

I will implement these and may be when you see it in action, you may be forthcoming in your approval/disapproval!

Arun

Suji Ram · Post by **Suji Ram** » 08 Feb 2007, 02:26

arunk wrote:vasya,

this is doable now itself. All you need to do is get a pdf print driver which allows you to save what you would normally send to a printer as a PDF file (e.g google for pdf995). With this then from the Printable View, you just choose Print options on your browser, and instead of sending to your printer, choose the pdf printer.

Arun

Arun
I downloaded the free version and tried. But all I can get is a pdf file without my work. ??
The way I am doing it is -right click on printable view,print target, and choose pdf995 and hit Ok. It asks for file name to save as pdf. A screen appears asking me to upgrade or continue with sponsor page..... The outcome is a pdf file of the sponsor page.
Help Please

ramakriya · Post by **ramakriya** » 08 Feb 2007, 03:15

arunk

have been tied up all day .. Hope to completely read your post and send my feedback by the end of the day..

-Ramakriya

ramakriya · Post by **ramakriya** » 08 Feb 2007, 03:18

Suji Ram wrote:Arun
I downloaded the free version and tried. But all I can get is a pdf file without my work. ??
The way I am doing it is -right click on printable view,print target, and choose pdf995 and hit Ok. It asks for file name to save as pdf. A screen appears asking me to upgrade or continue with sponsor page..... The outcome is a pdf file of the sponsor page.
Help Please

Try using primopdf or pdfcreator; I have had better results with these two. The former has some problems when converting word documents with certain formatting. But should not be a problem for normal use. I have not seen any issues with pdfcreator.

www.primopdf.com

http://sourceforge.net/projects/pdfcreator/

-Ramakriya

arunk · Post by **arunk** » 08 Feb 2007, 03:35

pdfcreator works fine too (although the free version i think it puts something in the footer). Its got a slick interface.

pdf995 is what I use. Its not the greatest interface, and it does bring up the browser to throw up an innocuous of ad of themselves - it is NOT adware. Its a small price to pay for something free and which doesnt put up stuff in the footer. (but if there are other better free tools which dont put up stuff in the footer, i say ditch this one).

suji - i dont know why you got that. I have used it many times and have not seen the problem you are seeing. Perhaps you let it open the (sponsor-ad) page and THEN clicked ok on the dialog where it asks for file?

Arun

arunk · Post by **arunk** » 08 Feb 2007, 03:36

thanks ramakriya.

Suji Ram · Post by **Suji Ram** » 08 Feb 2007, 04:40

arunk wrote:suji - i dont know why you got that. I have used it many times and have not seen the problem you are seeing. Perhaps you let it open the (sponsor-ad) page and THEN clicked ok on the dialog where it asks for file?

Arun

Thanks,

got it now ... was doing something dumb.

arunk · Post by **arunk** » 08 Feb 2007, 23:19

ramakriya,

did you get a chance to look at it? If not, I can post an update which has changes adhering to above. I am ready to post it.

BTW, coming to think of it is not a major change to the scheme. In essence, it involves only things:

1. instead of always M for anuswara in EVERY context, use `n or`N for anuswara when the underlying sound is not ma. So you use `n when it represents #n, ~n and n, and `N when it represents N sound.

2. Try to avoid specifying M unless absolutely needed. This is not a new rule.

Thanks
Arun

ramakriya · Post by **ramakriya** » 08 Feb 2007, 23:32

Finally, some comments -

arunk wrote:For kannada and telugu, there are contexts which certain combinations ALWAYS use anuswara i.e. #n[kg], ~n[cj], n[td], N[TD], m[pb]. Note that for making the input easier to read, for the first two cases, the scheme allows you just n instead of #n, ~n, i.e. pankaja, panca is ok. Also, currently, you would simply use M instead of #n/~n/n/m in all these cases.

Correct

arunk wrote:But as i have noted many times, except in the last case where M represents m, it is not recommended to use this as it not as phonetic, and also can lead to misleading pronunciation for people who do not know the language. Besides, one of the aims of the scheme was to avoid script specific artifacts wherever possible, and this is definitely one place where it can be avoided for these 2 languages.

That is fine too.

arunk wrote:However, note that for kannada and telugu, there are contexts where certain combinations do NOT ALWAYS use anuswara. Example is mya, mSa etc. We decided here that user would need to explicitly the anuswara (raMya). I think for kannada and telugu, these contexts only have the anuswara implying "m" sound (and not #n/~n/N - right?)

In these cases, it is not an anusvAra ; It is the vyanjana 'm' that appears in words like ramya, tAmra, Amla etc.

A anuswara is a representaion of an anunAsika (5th letter of each varga #n, ~n, N, n, M), occuring before a letter which is a non-anunAsika vargIya vyanjana ( k c T t p vargas, leaving out the last letter)

When the letter following an anunAsika is another anunAsika, (like in amnAya, vA#nmaya, amma, haNNu, kenne) or one of the following three avargIya vyanjanas (y r l - as in ramya, tAmra, Amla) then the anunAsika is used as it is in the samyuktAkshara.

(This info may be a repetition of what DRS may have said earlier).

When an anunAiska (normally m) is followed by v, S, Sh, s, h, L -> it will be represented by anusvAra.

arunk wrote:I am thinking sanskrit should also follow the same rule but obviously in more contexts because of use of anuswara in the language. IN THE MIDDLE of a word (end of words - see below), whenever anuswara is required, it needs to e explicitly specified - else no anuswara would be rendered. Of course as per current scheme, this would mean saMgIta, saMtOsha etc. which again is not phonetic, and can mislead pronounciation for some people.

i think malayalam can follow same rule (but contexts where anuswara figures would be the least of the 4 languages).

samskrita and malayALam experts should pitch in. All these discussions have made my head dizzy and now I am doubting myself when to use the bindu in samskrita

A more phonetic explicit anuswara specifier for use inside words

arunk wrote:But what if we adopt a different more phonetically fair specifier for anuswara in places it represents #n, ~n, n and N sound? For example, one that uses n/N but with a prefix. I propose the back-tick character ` - so you have sa`ngIta, sa`ntOsha. The advantage here is the explicit anuswara specification is still phonetically quite fair - sa`ngIta is much better than saMgIta. I find this a whole lot more desirable than M in such cases. But in contexts where anuswara represents the "m" sound (ahamkAra), we still use M as ahaMkAra. So we have 3 representations for explicit anuswara: `n, `N and M.

(note: we could choose a different character than backtick - only constraint being it should not be too "visible" and intrusive that it becomes an eyesore. We could also use it as a suffix - san`gIta as opposed to sa`ngIta - this may be better representation of the internal structure of the word?

I agree that sa`ngIta is better representation than saMgIta even though I have got used to the baraha's standard saMgIta

arunk wrote:anuswara at end of words for sanskrit
This is tricky in sanskrit as it depends on end of sentence etc. I can detect many cases in logic and apply but i dont think in a reliable way - which means a user that cares need to have control. So I am just going to have three options for sanskrit:
(a) always use anuswaras end of words (regardless of m/M)
(b) never use anuswaras at end of words (regardless of m/M)
(c) use anuswaras only when M is specified explicitly at end of words. This can allow a meticulous user to get the rendition to use anuswaras (at word-endings) in middle of sentences, and not at end of sentence - but its up to the user.

Time to dust any samskrita grammar books I have or find one to borrow :/

arunk wrote:Conclusion:
I think all this basically puts the responsibility on the user to know when sanskrit requires anuswaras and when it doesnt. I think this is ok, the editor is not involved in "teaching how to write sanskrit" Besides we were ok with that rule for "my" combinations in kannada and telugu. I dont know why I forgot that

There you go ..

arunk wrote:(b) kannada,telugu krithis:

(iii) Must specify in contexts which do not automatically imply anuswara - e.g. raMya.

This, again, is not an anusvAra, but vyanjana - So the correct representation is ramya; and hey - that is your current implementation too

All this talk about anusvAras reminds me of something funny that happened at the kid's kannada class here; One of the beginner kids told his mother that he could write amma (mother) - The mother was surprised, because in the class the teacher had only covered the vowels and not yet taught any of the vyanjanas let alone samyukAksharas. When asked the kid wrote ಅಂಅ to the surprise of both the teacher and the mother

which exacty sounds like ಅಮ್ಮ

-Ramakriya

arunk · Post by **arunk** » 08 Feb 2007, 23:38

So no occurences of M in kannada EVER when preceding ya,sa,Sa, varieties? Hmm.. I thought someone mentioned otherwise a while ago, but i think i must have been confused it with sanskrit rules.

This does make it easier - no need to specify anuswara in the script for kannada and telugu, since the places it figures are places where there is no ambiguity (it always figures in those contexts).

Arun

drshrikaanth · Post by **drshrikaanth** » 08 Feb 2007, 23:43

arunk wrote:So no occurences of M in kannada EVER when preceding ya,sa,Sa, varieties? Hmm.. I thought someone mentioned otherwise a while ago, but i think i must have been confused it with sanskrit rules.

Your memory serves you right. There are exceptions her and I had mentioned earlier. sometime anuswAra does occur before y,r & l e.g saMyukta, saMyama, saMrakShaNe, saMlApa saMyOjane etc

arunk · Post by **arunk** » 08 Feb 2007, 23:45

Yes drs. I was about to post a link your post long ago

Anyway here it goes: http://www.rasikas.org/forums/viewtopic.php?pid=27669#p27669 (post #115)

Arun

drshrikaanth · Post by **drshrikaanth** » 08 Feb 2007, 23:45

Arun
I suggest you cut and paste these bits of info/rules on MSword/Notepad as and when they come up. Then you dont have to rely on memory or others will not have to repeat what they said earlier. Well-meaning comment. Not having a go at you at all

arunk · Post by **arunk** » 08 Feb 2007, 23:46

yep thanks. I should have done this before

but was lazy and I thought i could use the search facility on the forum. But separate notes is better indeed

Arun

ramakriya · Post by **ramakriya** » 08 Feb 2007, 23:50

arunk wrote:So no occurences of M in kannada EVER when preceding ya,sa,Sa, varieties? Hmm.. I thought someone mentioned otherwise a while ago, but i think i must have been confused it with sanskrit rules.

Arun

Not so fast

I made an error in making a blanket statement - For eg there are words like samyukta , samyOga etc which are written with anusvAra .. This may be influenced by how this these words are written in samskR.ta also. Let me check with a samskR.ta expert (who is also a kannaDa expert) I know of. Better still, if I can make a member of this forum, and make him contribute to the thread

-Ramakriya

arunk · Post by **arunk** » 09 Feb 2007, 01:09

ramakriya,

it shouldnt matter. For all cases where usage of anuswara is not unambiguouosly implied, explicit specifier needs to be specified - this applies to all languages.

I will upload my new version soon

Arun

arunk · Post by **arunk** » 09 Feb 2007, 02:21

Hi folks,

I have uploaded another update. This includes the following enhancements

Enhanced Anuswara support
1. Scheme now accepts `n and `N as alternate explicit specifiers for anuswara in addition to the already existing M. These should be instead of M, in contexts when anuswara represents a non-ma sound (i.e. `n when it represents #n/~n/n, and `N when it represents N).
2. Explicit anuswara specifiers should be use only when necessary depending on the language. This means for tamizh usually never, kannada/telugu only in cases like saMyukta as similar, and for sanskrit only for words that do use anuswara.
3. For sanskrit, there are 4 choices that controls use of anuswara at end of words (that end with "m"). The default is anuswara is used for words in middle of sentence but not at the end. Note that the editor tries to figure this out automatically. From my limited testing, it seems to do a fair job. But if it misses an anuswara, you can use M to specify it explicitly. The other choices are: no anuswara (whether or not M is used at end of words), always use anuswara (for all words ending in m/M), and anuswara only for words ending in M. So if you have rAgaM tALam sa`ngItam (a hypothetical and not exactly correct example), then
(a) default would treat it like rAgaM tALaM sa`ngItam
(b) "No anuswara at word endings" would treat it like rAgam tALam sa`ngItam
(c) "Always use anuswara at word endings" would treat it like rAgaM tALaM sa`ngItaM
(d) "Use anuswara only for words ending in M" would treat it like rAgaM tALam sa`ngItam

Fix text to convert to scheme button (the new button which has a spanner/hammer.
This allows you to tell the editor make some conversions so that input text conforms to scheme, and various other changes (e.g. remove unnecessary anuswara specifiers etc.)

My intention is for people to be able to copy/paste text in other "informal" schemes and be able to easily "fix" it to conform to the unified scheme (e.g. vaataapi gaNapatim => vAtApi gaNapatim, and ashaindhaadum mayiloNDRu => asaindAdum mayilonDRu). Please let me know if you find this useful.

For people who havent seen this before:
The link to the unified transliteration scheme editor is http://arunk.freepgs.com/cmtranslit
The link to the scheme is http://arunk.freepgs.com/cmtranslit/cmt ... cheme.html

Any feedback is most welcome.

Thanks
Arun

ramakriya · Post by **ramakriya** » 10 Feb 2007, 02:29

arunk: Here is a new bug I found in the implementation of ai when it occurs in the middle (or end) of a word. Initial ai kAras look OK. AFAIK, this has happened in the latest update you did.

Take a look at the following testcase. All scripts except Tamizh have this bug:

-------------------------------------

a A i I u U R e E ai o O au aM aH
airAvata
bhAvaikya
vInai
--------------------------------------------
अ आ इ ई उ ऊ ऱ् ऎ ए ऐ ऒ ओ औ अं अः
ऐरावत
भावइक्य
वीनइ
--------------------------------------------
అ ఆ ఇ ఈ ఉ ఊ ఱ్ ఎ ఏ ఐ ఒ ఓ ఔ అం అః
ఐరావత
భావఇక్య
వీనఇ
--------------------------------------------
ಅ ಆ ಇ ಈ ಉ ಊ ಱ್ ಎ ಏ ಐ ಒ ಓ ಔ ಅಂ ಅಃ
ಐರಾವತ
ಭಾವಇಕ್ಯ
ವೀನಇ
---------------------------------------------
அ ஆ இ ஈ உ ஊ ற் எ ஏ ஐ ஒ ஓ ஔ அம் அ:
ஐராவத
பா4வைக்ய
வீனை
----------------------------------------------

-Ramakriya

drshrikaanth · Post by **drshrikaanth** » 10 Feb 2007, 02:38

And I think vowel R is also not showing up correctly here. Am I correct?

ramakriya · Post by **ramakriya** » 10 Feb 2007, 02:55

drshrikaanth wrote:And I think vowel R is also not showing up correctly here. Am I correct?

Good catch - but I forgot the . after R for vowel R; However, after I tried it found another problem. Any combinations with vowel R do not show up correctly in Kannada, telugu and samskR.ta, even though stand alone vowel looks OK

Here you go!

====================
र् ऱ् ऋ
म्ऋदु क्ऋष्ण अद्ऋष्ट

-------------------------------------
r R R.
mR.du kR.ShNa adR.ShTa
------------------------------------
ర్ ఱ్ ఋ
మ్ఋదు క్ఋష్ణ అద్ఋష్ట
----------------------------------------
ರ್ ಱ್ ಋ
ಮ್ಋದು ಕ್ಋಷ್ಣ ಅದ್ಋಷ್ಟ
---------------------------------------
ர் ற் ரு2
ம்ரு2து3 க்ரு2ஷ்ண அத்3ரு2ஷ்ட
=======================

-Ramakriya

arunk · Post by **arunk** » 10 Feb 2007, 02:56

i will see what is going on.

Thanks
Arun

arunk · Post by **arunk** » 10 Feb 2007, 03:01

btw, i was thinking whether "R" when preceded by a consonant, and also succeded by consonant (or at end of word), ,should become "R." i.e. kRshNa <=> kR.shNa for convenience. That can eliminate the need (the ugly) "." in most cases? Or is that not an unambigious case?

Arun

arunk · Post by **arunk** » 10 Feb 2007, 03:05

i think found the problem. A bug defnitely introduced last time around. I will upload an update now.

Arun

arunk · Post by **arunk** » 10 Feb 2007, 03:08

please try it now (refresh browser cache if needed) and let me know. I think both problems (ai, R.) were the result of the same bug. It probably cause au also not to work - i.e. 2 letter vowel representation following consonants for non-tamil languages

Arun

ramakriya · Post by **ramakriya** » 10 Feb 2007, 03:26

arunk wrote:btw, i was thinking whether "R" when preceded by a consonant, and also succeded by consonant (or at end of word), ,should become "R." i.e. kRshNa <=> kR.shNa for convenience. That can eliminate the need (the ugly) "." in most cases? Or is that not an unambigious case?

Arun

I think that is quite true in practice. For words like R.Na, R.tu etc we can specify explicitelty.

But doesn't that pose a problem with tamizh which uses the R (shakaTa rEpha), because words like க்ரு2ஷ்ண will start showing up with ற் instead of ர் Isnt it?

-Ramakriya

arunk · Post by **arunk** » 10 Feb 2007, 03:32

not sure - but in tamizh also R wont occur after a consonant AND also before one (or at end) - both conditions must be met for treating R and R.

You have saurAshTRam (vowel follows), kanRu (although we will specify as kanDRu), paRRu (vowel precedes, but again we will specify as paTRu), paRavai (vowel precedes and succeedes). I think R cannot be at end of word in tamizh

Arun

ramakriya · Post by **ramakriya** » 15 Feb 2007, 05:58

One more new bug - this time with anuswara implementation with Ta varga:
Shows up in Kannada and Telugu, and Samskr.ta
Second line in each set is with explicit anuswara specification, and that is OK.

ಕನ್ಡೆ ನಾ ಗೋವಿಂದನ
ಕಂಡೆ ನಾ ಗೋವಿಂದನ

कन्डॆ ना गोविन्दन
कण्डॆ ना गोविन्दन

kanDe nA gOvindana
kaMDe nA gOvindana

కన్డె నా గోవిందన
కండె నా గోవిందన

ಕನ್ಡೆ ನಾ ಗೋವಿಂದನ
ಕಂಡೆ ನಾ ಗೋವಿಂದನ

கன்டெ நா கோ<sup>3</sup>விந்தன
கண்டெ நா கோ<sup>3</sup>விந்தன
-Ramakriya

arunk · Post by **arunk** » 15 Feb 2007, 06:23

i thought it would have to be kaNDE for anuswara to use - since the nasal that is part of the pentat to which Da belongs is Na and not na (so NDa, NTa but nta, nda for anuswara to be used).

I guess not?

Arun

ramakriya · Post by **ramakriya** » 15 Feb 2007, 07:25

arunk wrote:i thought it would have to be kaNDE for anuswara to use - since the nasal that is part of the pentat to which Da belongs is Na and not na (so NDa, NTa but nta, nda for anuswara to be used).

I guess not?

Arun

If we allow words like ankusha, ungura incara, panjara instead of the explicit #n and ~n use anuswara in kannada and telugu scripts, then IMO, nTa, nDa as well should be allowed instead of the explicit N.

You can check with others what they feel about this.

-Ramakriya

arunk · Post by **arunk** » 15 Feb 2007, 07:33

the other form is alllowed for a few reasons:
1. There is no single letter representation possible for those nasal consonants. This is unlike for N.
2. nka, nta etc. combinations are common in use in informal use i.e. Sankara, tangam, angam, sangam, panca, pankaja, anjali etc. I dont think this is the case for T/D or atleast not with the same frequency of usage.
3 (weakest) substituting N for n is phonetically a lot more misleading than substituting n for #n/~n. It is perhaps because of reason #2 that this is so, i.e. we are used to seeing the english forms Sankara, sangam so much that when we see nga/nka/nca/nja we are able to associate what the "n" stands for.

Arun

arunk · Post by **arunk** » 21 Feb 2007, 02:37

i have a question regard the Harvard-kyoto convention as listed in the Cologne Sanskrit lexicon site (i want to provide a conversion from it in the editor):

Cologne Digital Sanskrit Lexicon (from Monier-Williams' 'Sanskrit-English Dictionary')

The English description contains a translation, grammatical and any other information listed in the MW. You may search for all of it.

The transliteration is based on the Harvard-Kyoto (HK) convention as follows:
Code: Select all
   a A i I u U R RR lR lRR e ai o au M H
   k kh g gh G c ch j jh J
   T Th D Dh N t th d dh n
   p ph b bh m y r l v z S s h

I notice that R and L consonants (the R in first row is the vowel) are not listed and am wondering why. La does occur in sanskrit krithis (sakaLE, kancadaLAyadAkshi)?

Is this because H-K is for older form of sanskrit maybe? Or is the above out-date representation of H-K?

Thanks
Arun

ramakriya · Post by **ramakriya** » 21 Feb 2007, 03:36

arunk wrote:I notice that R and L consonants (the R in first row is the vowel) are not listed and am wondering why. La does occur in sanskrit krithis (sakaLE, kancadaLAyadAkshi)?

Is this because H-K is for older form of sanskrit maybe? Or is the above out-date representation of H-K?

Thanks
Arun

I think the hard 'R' sound (as in tamizh) does not exist in samskR.ta; And 'L' is an import from southern languages. Classical samskR.ta does not have this consonent and uses l instead.

-Ramakriya

arunk · Post by **arunk** » 21 Feb 2007, 03:42

thanks.

Arun

arunk · Post by **arunk** » 21 Feb 2007, 21:17

Does anyone object if I make the following changes to the scheme itself:

1. Change qualifier numbering for tamizh ca letter. Right now 1,2,3,4,5 stand for ca, cha, sa, ja, jha. The use for "sa" sound for 3, basically makes this inconsistent with all other consonants in the 5 pentats. So instead I want to use 5 for "sa". This means 1,2,3,4: ca, cha, ja, jha (consistent with ka/kha/ga/gha), and 5 for sa.
This can affect say rendition of e.g. a construct like "asam" when using qualifier scheme "No qualifiers for hard sound".
2. Make R <=> R. if it meets two conditions (a) follow a consonant (b) and precedes a consonant.

Thanks
Arun

arunk · Post by **arunk** » 23 Feb 2007, 03:05

jayaram/vgv

i am trying to add back support for malayalam and having some headaches because of what seems like differing behavior with fonts.

if use the zero-width joiner to force cillakshara for n, N, l, L, r, at end of words, then some fonts dont render the cillu for some, letters and others dont render for other letters, and some dont render at all

. But when i copy and paste it to word, then some of the non-cillu forms become cillu forms (i think you observed something similar). Basically IE vs FireFox vs Word all exhibit slightly different behaviour (although with Firefox i think i see same as IE only that it seem to have painting problems on Firefox - particularly when i add the extra half-consonant for rna, rva etc.)

In general malayalam support on various fonts seem spotty w.r.t this - unless I am doing something wrong. It is going to be hard to come up with something reliable unless we can find out which combination (i.e. font) works best with browsers.

vgv - are you seeing anything like this? Pl. try generating for the following:

van vaN val var vaL varNa

Thanks
Arun

arunk · Post by **arunk** » 23 Feb 2007, 03:14

i typed in man maN mal maL maR in my editor and translated. My font is Akshar Unicode and it generated cillu for maN, mal, mar but not for man, and maL.

I copy the generated malayalam (which is unicode including the ZWJ which is "hidden") and paste it into Wordpad: All cillu's disappear (and become candrakala like). I dont know what font it picked but it is not Akshar Unicode.

I paste it to Word: Now man which wasnt cillu, becomes cillu. maN which was cillu, becomes non-cillu. mal retains cillu. maL which wasnt cillu becomes cillu. mar which was cillu looses cillu. Again it is not clear what font it picked but it isnt Akshar Unicode.

what a mess

!

Arun

jayaram · Post by **jayaram** » 23 Feb 2007, 03:16

arun - can you post it here and see how it appears?

arunk · Post by **arunk** » 23 Feb 2007, 03:19

for man maN mal maL mar

മന്‍ മണ് മല്‍ മള്‍ മര്‍

(btw even in the submit post i lost all cillus - i dont know if they will change when they get posted - they may if ZWJ are posted, and then the font your browser picks is able to deal with them better)

arunk · Post by **arunk** » 23 Feb 2007, 03:19

ok with my current firefox settings - no cillus on the post

arunk · Post by **arunk** » 23 Feb 2007, 03:20

Viewed this thread on IE: With my current settings: man - cillu, maN - no-cillu, mal - cillu, maL - cillu, mar - cillu

Much better

Arun

arunk · Post by **arunk** » 23 Feb 2007, 03:23

on firefox. I picked Akshar Unicode, then told it to not to "let pages pick their own fonts", and voila cillus all around (in Akshar Unicode) !

Arun

arunk · Post by **arunk** » 23 Feb 2007, 03:25

so definitely a problem due to differnt fonts having different level of "support" (or lack of it)