Single Transliteration Scheme for all CM Languages - Part 2
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
vasya,
this is doable now itself. All you need to do is get a pdf print driver which allows you to save what you would normally send to a printer as a PDF file (e.g google for pdf995). With this then from the Printable View, you just choose Print options on your browser, and instead of sending to your printer, choose the pdf printer.
Arun
this is doable now itself. All you need to do is get a pdf print driver which allows you to save what you would normally send to a printer as a PDF file (e.g google for pdf995). With this then from the Printable View, you just choose Print options on your browser, and instead of sending to your printer, choose the pdf printer.
Arun
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
i have tested with looking up a sanskrit word database for using anuswara, and it works. However, there is a significant problem: The input text (as in sAhitya) can have many words (that can be a potential match in the dictionary) combined into single words in english. Note also that when words are combined, they morph as per rules of language.
So unless language rules are applied (which is very difficult), it is impossible to reliably figure out which words in the input do correspond to words in dictionary (i.e. those that require anuswara in sanskrit).
For example, if sangIta comes as such, I can match against saMgIta (with some smart logic). I can even match sangItam (add m if word ends with a and try for a match), but what if the word is karnAtakasangItam in one word (or something else)? "sangIta" can occur anywhere in an input word. Now a solution could be match it anywhere in an input word, but I see an entry for aMsa - and does it mean amsa anywhere should match? . I am thinking not.
So while the dictionary would help, i may not help that much. Of course, i can introduce a feature, where use highlights some text and explicitly asks for a match in database - but that means only a user who knows sanskrit well will be able to provide the correct input that will translate to all languages :(. I guess that is going to be our achilles heel.
We are so close to our solution, yet there seems to be an insurmountable barrier .
Any suggestions?
Thanks
Arun
So unless language rules are applied (which is very difficult), it is impossible to reliably figure out which words in the input do correspond to words in dictionary (i.e. those that require anuswara in sanskrit).
For example, if sangIta comes as such, I can match against saMgIta (with some smart logic). I can even match sangItam (add m if word ends with a and try for a match), but what if the word is karnAtakasangItam in one word (or something else)? "sangIta" can occur anywhere in an input word. Now a solution could be match it anywhere in an input word, but I see an entry for aMsa - and does it mean amsa anywhere should match? . I am thinking not.
So while the dictionary would help, i may not help that much. Of course, i can introduce a feature, where use highlights some text and explicitly asks for a match in database - but that means only a user who knows sanskrit well will be able to provide the correct input that will translate to all languages :(. I guess that is going to be our achilles heel.
We are so close to our solution, yet there seems to be an insurmountable barrier .
Any suggestions?
Thanks
Arun
Last edited by arunk on 07 Feb 2007, 01:33, edited 1 time in total.
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
May be this isnt a big deal. If the input represents a sanskrit krithi, then it is not an unfair expectation for the user to be aware of where anuswara figures?arunk wrote:but that means only a user who knows sanskrit well will be able to provide the correct input that will translate to all languages :(.
But if the krithi is non-sanskrit, and the user entering the krithi dont know sanskrit rules - how would it be if certain words (in a language other than sanskrit), that happen to be sanskrit based get rendered in sanskrit with no anuswara?
For example, if the word like sangItamu (as entered) is in a telugu krithi, but as rendered in sanskrit say doesn appear with anuswara - is that too bad?
Arun
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
Please let me know if this is ok.. Drs/ramakriya/jayaram - in particular i am going to bother you specifically . Feedback from others are also very welcome
After racking my brains over this more, I have an alternative proposal which may be the best given our constraints.
For kannada and telugu, there are contexts which certain combinations ALWAYS use anuswara i.e. #n[kg], ~n[cj], n[td], N[TD], m[pb]. Note that for making the input easier to read, for the first two cases, the scheme allows you just n instead of #n, ~n, i.e. pankaja, panca is ok. Also, currently, you would simply use M instead of #n/~n/n/m in all these cases. But as i have noted many times, except in the last case where M represents m, it is not recommended to use this as it not as phonetic, and also can lead to misleading pronunciation for people who do not know the language. Besides, one of the aims of the scheme was to avoid script specific artifacts wherever possible, and this is definitely one place where it can be avoided for these 2 languages.
However, note that for kannada and telugu, there are contexts where certain combinations do NOT ALWAYS use anuswara. Example is mya, mSa etc. We decided here that user would need to explicitly the anuswara (raMya). I think for kannada and telugu, these contexts only have the anuswara implying "m" sound (and not #n/~n/N - right?)
I am thinking sanskrit should also follow the same rule but obviously in more contexts because of use of anuswara in the language. IN THE MIDDLE of a word (end of words - see below), whenever anuswara is required, it needs to e explicitly specified - else no anuswara would be rendered. Of course as per current scheme, this would mean saMgIta, saMtOsha etc. which again is not phonetic, and can mislead pronounciation for some people.
i think malayalam can follow same rule (but contexts where anuswara figures would be the least of the 4 languages).
A more phonetic explicit anuswara specifier for use inside words
But what if we adopt a different more phonetically fair specifier for anuswara in places it represents #n, ~n, n and N sound? For example, one that uses n/N but with a prefix. I propose the back-tick character ` - so you have sa`ngIta, sa`ntOsha. The advantage here is the explicit anuswara specification is still phonetically quite fair - sa`ngIta is much better than saMgIta. I find this a whole lot more desirable than M in such cases. But in contexts where anuswara represents the "m" sound (ahamkAra), we still use M as ahaMkAra. So we have 3 representations for explicit anuswara: `n, `N and M.
(note: we could choose a different character than backtick - only constraint being it should not be too "visible" and intrusive that it becomes an eyesore. We could also use it as a suffix - san`gIta as opposed to sa`ngIta - this may be better representation of the internal structure of the word?
anuswara at end of words for sanskrit
This is tricky in sanskrit as it depends on end of sentence etc. I can detect many cases in logic and apply but i dont think in a reliable way - which means a user that cares need to have control. So I am just going to have three options for sanskrit:
(a) always use anuswaras end of words (regardless of m/M)
(b) never use anuswaras at end of words (regardless of m/M)
(c) use anuswaras only when M is specified explicitly at end of words. This can allow a meticulous user to get the rendition to use anuswaras (at word-endings) in middle of sentences, and not at end of sentence - but its up to the user.
Conclusion:
I think all this basically puts the responsibility on the user to know when sanskrit requires anuswaras and when it doesnt. I think this is ok, the editor is not involved in "teaching how to write sanskrit" Besides we were ok with that rule for "my" combinations in kannada and telugu. I dont know why I forgot that
Rules for specifying Anuswara
So based on this here are some concise rules i can think of:
(a) tamizh krithis: no need to specifify anuswara ever as it doesnt make sense for the language. When this gets transl. to kannada/telugu, anuswara would be used in middle of words for #n[kg], ~n[cj], n[td], N[TD], m[pb], and also when m is at end of word. When a tamizh krithi gets transl to sanskrit/malayalam, sanskrit-based words may not appear ideally, as they wont have anuswara. This may be ok as, while the word is sanskrit-based, one could argue it is still in the context of a tamizh krithi and thus non-sanskrit, and sanskrit rules for anuswara may not apply. Of course, a person who does care about sanskrit rendition, can introduce explicit anuswara specifiers even in tamizh krithi (e.g. sa`ngIta)
(b) kannada,telugu krithis:
(i) Should not explicitly specifiy anuswara in contexts where it represents ~n, #n, n, N, M (i.e. use panca/pa~nca, Sankara/Sa#nkara, pANDava, amba).
(ii) Should not explicitly specify anuswara for end of words as it always imply anuswara. Use "m" instead
(iii) Must specify in contexts which do not automatically imply anuswara - e.g. raMya.
So basically specify anuswara only when it is not automatically implied. Note again, that this means that when the krithi gets transl. to sanskrit/malayalam, sanskrit-words may not appear ideally. Depending on user's preference then explicit anuswara may be specified for (i) and (ii), but as `n, `N where it represents #n, ~n, n, N, and M ONLY when it represents m sound.
(c) sanskrit krithis: Must specify anuswaras but only where they occur. Again specify `n, `N where it represents #n, ~n, n, N, and M ONLY when it represents m sound. When a sanskrit krithi gets translated to kannada/telugu, it *may* force anuswaras in places which normally are not there? But I am not sure.
(b) malayalam krithis: Must specify anuswaras but only where they occur. I think anuswara would figure and hence need be specified only in cases where it represents "m" sound (like raMya)? If so, the editor may ignore anuswara specifier in places where it represents #n, ~n, n and N sound? (and use actual characters) - so sa#ngIta/sangIta/sa`ngIta would all be rendered as sa#ngIta.
Thanks
Arun
After racking my brains over this more, I have an alternative proposal which may be the best given our constraints.
For kannada and telugu, there are contexts which certain combinations ALWAYS use anuswara i.e. #n[kg], ~n[cj], n[td], N[TD], m[pb]. Note that for making the input easier to read, for the first two cases, the scheme allows you just n instead of #n, ~n, i.e. pankaja, panca is ok. Also, currently, you would simply use M instead of #n/~n/n/m in all these cases. But as i have noted many times, except in the last case where M represents m, it is not recommended to use this as it not as phonetic, and also can lead to misleading pronunciation for people who do not know the language. Besides, one of the aims of the scheme was to avoid script specific artifacts wherever possible, and this is definitely one place where it can be avoided for these 2 languages.
However, note that for kannada and telugu, there are contexts where certain combinations do NOT ALWAYS use anuswara. Example is mya, mSa etc. We decided here that user would need to explicitly the anuswara (raMya). I think for kannada and telugu, these contexts only have the anuswara implying "m" sound (and not #n/~n/N - right?)
I am thinking sanskrit should also follow the same rule but obviously in more contexts because of use of anuswara in the language. IN THE MIDDLE of a word (end of words - see below), whenever anuswara is required, it needs to e explicitly specified - else no anuswara would be rendered. Of course as per current scheme, this would mean saMgIta, saMtOsha etc. which again is not phonetic, and can mislead pronounciation for some people.
i think malayalam can follow same rule (but contexts where anuswara figures would be the least of the 4 languages).
A more phonetic explicit anuswara specifier for use inside words
But what if we adopt a different more phonetically fair specifier for anuswara in places it represents #n, ~n, n and N sound? For example, one that uses n/N but with a prefix. I propose the back-tick character ` - so you have sa`ngIta, sa`ntOsha. The advantage here is the explicit anuswara specification is still phonetically quite fair - sa`ngIta is much better than saMgIta. I find this a whole lot more desirable than M in such cases. But in contexts where anuswara represents the "m" sound (ahamkAra), we still use M as ahaMkAra. So we have 3 representations for explicit anuswara: `n, `N and M.
(note: we could choose a different character than backtick - only constraint being it should not be too "visible" and intrusive that it becomes an eyesore. We could also use it as a suffix - san`gIta as opposed to sa`ngIta - this may be better representation of the internal structure of the word?
anuswara at end of words for sanskrit
This is tricky in sanskrit as it depends on end of sentence etc. I can detect many cases in logic and apply but i dont think in a reliable way - which means a user that cares need to have control. So I am just going to have three options for sanskrit:
(a) always use anuswaras end of words (regardless of m/M)
(b) never use anuswaras at end of words (regardless of m/M)
(c) use anuswaras only when M is specified explicitly at end of words. This can allow a meticulous user to get the rendition to use anuswaras (at word-endings) in middle of sentences, and not at end of sentence - but its up to the user.
Conclusion:
I think all this basically puts the responsibility on the user to know when sanskrit requires anuswaras and when it doesnt. I think this is ok, the editor is not involved in "teaching how to write sanskrit" Besides we were ok with that rule for "my" combinations in kannada and telugu. I dont know why I forgot that
Rules for specifying Anuswara
So based on this here are some concise rules i can think of:
(a) tamizh krithis: no need to specifify anuswara ever as it doesnt make sense for the language. When this gets transl. to kannada/telugu, anuswara would be used in middle of words for #n[kg], ~n[cj], n[td], N[TD], m[pb], and also when m is at end of word. When a tamizh krithi gets transl to sanskrit/malayalam, sanskrit-based words may not appear ideally, as they wont have anuswara. This may be ok as, while the word is sanskrit-based, one could argue it is still in the context of a tamizh krithi and thus non-sanskrit, and sanskrit rules for anuswara may not apply. Of course, a person who does care about sanskrit rendition, can introduce explicit anuswara specifiers even in tamizh krithi (e.g. sa`ngIta)
(b) kannada,telugu krithis:
(i) Should not explicitly specifiy anuswara in contexts where it represents ~n, #n, n, N, M (i.e. use panca/pa~nca, Sankara/Sa#nkara, pANDava, amba).
(ii) Should not explicitly specify anuswara for end of words as it always imply anuswara. Use "m" instead
(iii) Must specify in contexts which do not automatically imply anuswara - e.g. raMya.
So basically specify anuswara only when it is not automatically implied. Note again, that this means that when the krithi gets transl. to sanskrit/malayalam, sanskrit-words may not appear ideally. Depending on user's preference then explicit anuswara may be specified for (i) and (ii), but as `n, `N where it represents #n, ~n, n, N, and M ONLY when it represents m sound.
(c) sanskrit krithis: Must specify anuswaras but only where they occur. Again specify `n, `N where it represents #n, ~n, n, N, and M ONLY when it represents m sound. When a sanskrit krithi gets translated to kannada/telugu, it *may* force anuswaras in places which normally are not there? But I am not sure.
(b) malayalam krithis: Must specify anuswaras but only where they occur. I think anuswara would figure and hence need be specified only in cases where it represents "m" sound (like raMya)? If so, the editor may ignore anuswara specifier in places where it represents #n, ~n, n and N sound? (and use actual characters) - so sa#ngIta/sangIta/sa`ngIta would all be rendered as sa#ngIta.
Thanks
Arun
Last edited by arunk on 07 Feb 2007, 22:24, edited 1 time in total.
-
- Posts: 1529
- Joined: 09 Feb 2006, 00:04
Arunarunk wrote:vasya,
this is doable now itself. All you need to do is get a pdf print driver which allows you to save what you would normally send to a printer as a PDF file (e.g google for pdf995). With this then from the Printable View, you just choose Print options on your browser, and instead of sending to your printer, choose the pdf printer.
Arun
I downloaded the free version and tried. But all I can get is a pdf file without my work. ??
The way I am doing it is -right click on printable view,print target, and choose pdf995 and hit Ok. It asks for file name to save as pdf. A screen appears asking me to upgrade or continue with sponsor page..... The outcome is a pdf file of the sponsor page.
Help Please
-
- Posts: 1876
- Joined: 04 Feb 2010, 02:05
Try using primopdf or pdfcreator; I have had better results with these two. The former has some problems when converting word documents with certain formatting. But should not be a problem for normal use. I have not seen any issues with pdfcreator.Suji Ram wrote:Arun
I downloaded the free version and tried. But all I can get is a pdf file without my work. ??
The way I am doing it is -right click on printable view,print target, and choose pdf995 and hit Ok. It asks for file name to save as pdf. A screen appears asking me to upgrade or continue with sponsor page..... The outcome is a pdf file of the sponsor page.
Help Please
www.primopdf.com
http://sourceforge.net/projects/pdfcreator/
-Ramakriya
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
pdfcreator works fine too (although the free version i think it puts something in the footer). Its got a slick interface.
pdf995 is what I use. Its not the greatest interface, and it does bring up the browser to throw up an innocuous of ad of themselves - it is NOT adware. Its a small price to pay for something free and which doesnt put up stuff in the footer. (but if there are other better free tools which dont put up stuff in the footer, i say ditch this one).
suji - i dont know why you got that. I have used it many times and have not seen the problem you are seeing. Perhaps you let it open the (sponsor-ad) page and THEN clicked ok on the dialog where it asks for file?
Arun
pdf995 is what I use. Its not the greatest interface, and it does bring up the browser to throw up an innocuous of ad of themselves - it is NOT adware. Its a small price to pay for something free and which doesnt put up stuff in the footer. (but if there are other better free tools which dont put up stuff in the footer, i say ditch this one).
suji - i dont know why you got that. I have used it many times and have not seen the problem you are seeing. Perhaps you let it open the (sponsor-ad) page and THEN clicked ok on the dialog where it asks for file?
Arun
Last edited by arunk on 08 Feb 2007, 03:35, edited 1 time in total.
-
- Posts: 1529
- Joined: 09 Feb 2006, 00:04
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
ramakriya,
did you get a chance to look at it? If not, I can post an update which has changes adhering to above. I am ready to post it.
BTW, coming to think of it is not a major change to the scheme. In essence, it involves only things:
1. instead of always M for anuswara in EVERY context, use `n or`N for anuswara when the underlying sound is not ma. So you use `n when it represents #n, ~n and n, and `N when it represents N sound.
2. Try to avoid specifying M unless absolutely needed. This is not a new rule.
Thanks
Arun
did you get a chance to look at it? If not, I can post an update which has changes adhering to above. I am ready to post it.
BTW, coming to think of it is not a major change to the scheme. In essence, it involves only things:
1. instead of always M for anuswara in EVERY context, use `n or`N for anuswara when the underlying sound is not ma. So you use `n when it represents #n, ~n and n, and `N when it represents N sound.
2. Try to avoid specifying M unless absolutely needed. This is not a new rule.
Thanks
Arun
Last edited by arunk on 08 Feb 2007, 23:20, edited 1 time in total.
-
- Posts: 1876
- Joined: 04 Feb 2010, 02:05
Finally, some comments -
A anuswara is a representaion of an anunAsika (5th letter of each varga #n, ~n, N, n, M), occuring before a letter which is a non-anunAsika vargIya vyanjana ( k c T t p vargas, leaving out the last letter)
When the letter following an anunAsika is another anunAsika, (like in amnAya, vA#nmaya, amma, haNNu, kenne) or one of the following three avargIya vyanjanas (y r l - as in ramya, tAmra, Amla) then the anunAsika is used as it is in the samyuktAkshara.
(This info may be a repetition of what DRS may have said earlier).
When an anunAiska (normally m) is followed by v, S, Sh, s, h, L -> it will be represented by anusvAra.
A more phonetic explicit anuswara specifier for use inside words
All this talk about anusvAras reminds me of something funny that happened at the kid's kannada class here; One of the beginner kids told his mother that he could write amma (mother) - The mother was surprised, because in the class the teacher had only covered the vowels and not yet taught any of the vyanjanas let alone samyukAksharas. When asked the kid wrote ಅಂಅ to the surprise of both the teacher and the mother which exacty sounds like ಅಮ್ಮ
-Ramakriya
Correctarunk wrote:For kannada and telugu, there are contexts which certain combinations ALWAYS use anuswara i.e. #n[kg], ~n[cj], n[td], N[TD], m[pb]. Note that for making the input easier to read, for the first two cases, the scheme allows you just n instead of #n, ~n, i.e. pankaja, panca is ok. Also, currently, you would simply use M instead of #n/~n/n/m in all these cases.
That is fine too.arunk wrote:But as i have noted many times, except in the last case where M represents m, it is not recommended to use this as it not as phonetic, and also can lead to misleading pronunciation for people who do not know the language. Besides, one of the aims of the scheme was to avoid script specific artifacts wherever possible, and this is definitely one place where it can be avoided for these 2 languages.
In these cases, it is not an anusvAra ; It is the vyanjana 'm' that appears in words like ramya, tAmra, Amla etc.arunk wrote:However, note that for kannada and telugu, there are contexts where certain combinations do NOT ALWAYS use anuswara. Example is mya, mSa etc. We decided here that user would need to explicitly the anuswara (raMya). I think for kannada and telugu, these contexts only have the anuswara implying "m" sound (and not #n/~n/N - right?)
A anuswara is a representaion of an anunAsika (5th letter of each varga #n, ~n, N, n, M), occuring before a letter which is a non-anunAsika vargIya vyanjana ( k c T t p vargas, leaving out the last letter)
When the letter following an anunAsika is another anunAsika, (like in amnAya, vA#nmaya, amma, haNNu, kenne) or one of the following three avargIya vyanjanas (y r l - as in ramya, tAmra, Amla) then the anunAsika is used as it is in the samyuktAkshara.
(This info may be a repetition of what DRS may have said earlier).
When an anunAiska (normally m) is followed by v, S, Sh, s, h, L -> it will be represented by anusvAra.
samskrita and malayALam experts should pitch in. All these discussions have made my head dizzy and now I am doubting myself when to use the bindu in samskritaarunk wrote:I am thinking sanskrit should also follow the same rule but obviously in more contexts because of use of anuswara in the language. IN THE MIDDLE of a word (end of words - see below), whenever anuswara is required, it needs to e explicitly specified - else no anuswara would be rendered. Of course as per current scheme, this would mean saMgIta, saMtOsha etc. which again is not phonetic, and can mislead pronounciation for some people.
i think malayalam can follow same rule (but contexts where anuswara figures would be the least of the 4 languages).
A more phonetic explicit anuswara specifier for use inside words
I agree that sa`ngIta is better representation than saMgIta even though I have got used to the baraha's standard saMgItaarunk wrote:But what if we adopt a different more phonetically fair specifier for anuswara in places it represents #n, ~n, n and N sound? For example, one that uses n/N but with a prefix. I propose the back-tick character ` - so you have sa`ngIta, sa`ntOsha. The advantage here is the explicit anuswara specification is still phonetically quite fair - sa`ngIta is much better than saMgIta. I find this a whole lot more desirable than M in such cases. But in contexts where anuswara represents the "m" sound (ahamkAra), we still use M as ahaMkAra. So we have 3 representations for explicit anuswara: `n, `N and M.
(note: we could choose a different character than backtick - only constraint being it should not be too "visible" and intrusive that it becomes an eyesore. We could also use it as a suffix - san`gIta as opposed to sa`ngIta - this may be better representation of the internal structure of the word?
Time to dust any samskrita grammar books I have or find one to borrow :/arunk wrote:anuswara at end of words for sanskrit
This is tricky in sanskrit as it depends on end of sentence etc. I can detect many cases in logic and apply but i dont think in a reliable way - which means a user that cares need to have control. So I am just going to have three options for sanskrit:
(a) always use anuswaras end of words (regardless of m/M)
(b) never use anuswaras at end of words (regardless of m/M)
(c) use anuswaras only when M is specified explicitly at end of words. This can allow a meticulous user to get the rendition to use anuswaras (at word-endings) in middle of sentences, and not at end of sentence - but its up to the user.
There you go ..arunk wrote:Conclusion:
I think all this basically puts the responsibility on the user to know when sanskrit requires anuswaras and when it doesnt. I think this is ok, the editor is not involved in "teaching how to write sanskrit" Besides we were ok with that rule for "my" combinations in kannada and telugu. I dont know why I forgot that
This, again, is not an anusvAra, but vyanjana - So the correct representation is ramya; and hey - that is your current implementation tooarunk wrote:(b) kannada,telugu krithis:
(iii) Must specify in contexts which do not automatically imply anuswara - e.g. raMya.
All this talk about anusvAras reminds me of something funny that happened at the kid's kannada class here; One of the beginner kids told his mother that he could write amma (mother) - The mother was surprised, because in the class the teacher had only covered the vowels and not yet taught any of the vyanjanas let alone samyukAksharas. When asked the kid wrote ಅಂಅ to the surprise of both the teacher and the mother which exacty sounds like ಅಮ್ಮ
-Ramakriya
Last edited by ramakriya on 08 Feb 2007, 23:43, edited 1 time in total.
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
So no occurences of M in kannada EVER when preceding ya,sa,Sa, varieties? Hmm.. I thought someone mentioned otherwise a while ago, but i think i must have been confused it with sanskrit rules.
This does make it easier - no need to specify anuswara in the script for kannada and telugu, since the places it figures are places where there is no ambiguity (it always figures in those contexts).
Arun
This does make it easier - no need to specify anuswara in the script for kannada and telugu, since the places it figures are places where there is no ambiguity (it always figures in those contexts).
Arun
Last edited by arunk on 08 Feb 2007, 23:39, edited 1 time in total.
-
- Posts: 4066
- Joined: 26 Mar 2005, 17:01
Your memory serves you right. There are exceptions her and I had mentioned earlier. sometime anuswAra does occur before y,r & l e.g saMyukta, saMyama, saMrakShaNe, saMlApa saMyOjane etcarunk wrote:So no occurences of M in kannada EVER when preceding ya,sa,Sa, varieties? Hmm.. I thought someone mentioned otherwise a while ago, but i think i must have been confused it with sanskrit rules.
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
Yes drs. I was about to post a link your post long ago
Anyway here it goes: http://www.rasikas.org/forums/viewtopic.php?pid=27669#p27669 (post #115)
Arun
Anyway here it goes: http://www.rasikas.org/forums/viewtopic.php?pid=27669#p27669 (post #115)
Arun
Last edited by arunk on 08 Feb 2007, 23:45, edited 1 time in total.
-
- Posts: 4066
- Joined: 26 Mar 2005, 17:01
-
- Posts: 1876
- Joined: 04 Feb 2010, 02:05
Not so fast I made an error in making a blanket statement - For eg there are words like samyukta , samyOga etc which are written with anusvAra .. This may be influenced by how this these words are written in samskR.ta also. Let me check with a samskR.ta expert (who is also a kannaDa expert) I know of. Better still, if I can make a member of this forum, and make him contribute to the threadarunk wrote:So no occurences of M in kannada EVER when preceding ya,sa,Sa, varieties? Hmm.. I thought someone mentioned otherwise a while ago, but i think i must have been confused it with sanskrit rules.
Arun
-Ramakriya
Last edited by ramakriya on 08 Feb 2007, 23:57, edited 1 time in total.
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
Hi folks,
I have uploaded another update. This includes the following enhancements
Enhanced Anuswara support
1. Scheme now accepts `n and `N as alternate explicit specifiers for anuswara in addition to the already existing M. These should be instead of M, in contexts when anuswara represents a non-ma sound (i.e. `n when it represents #n/~n/n, and `N when it represents N).
2. Explicit anuswara specifiers should be use only when necessary depending on the language. This means for tamizh usually never, kannada/telugu only in cases like saMyukta as similar, and for sanskrit only for words that do use anuswara.
3. For sanskrit, there are 4 choices that controls use of anuswara at end of words (that end with "m"). The default is anuswara is used for words in middle of sentence but not at the end. Note that the editor tries to figure this out automatically. From my limited testing, it seems to do a fair job. But if it misses an anuswara, you can use M to specify it explicitly. The other choices are: no anuswara (whether or not M is used at end of words), always use anuswara (for all words ending in m/M), and anuswara only for words ending in M. So if you have rAgaM tALam sa`ngItam (a hypothetical and not exactly correct example), then
(a) default would treat it like rAgaM tALaM sa`ngItam
(b) "No anuswara at word endings" would treat it like rAgam tALam sa`ngItam
(c) "Always use anuswara at word endings" would treat it like rAgaM tALaM sa`ngItaM
(d) "Use anuswara only for words ending in M" would treat it like rAgaM tALam sa`ngItam
Fix text to convert to scheme button (the new button which has a spanner/hammer.
This allows you to tell the editor make some conversions so that input text conforms to scheme, and various other changes (e.g. remove unnecessary anuswara specifiers etc.)
My intention is for people to be able to copy/paste text in other "informal" schemes and be able to easily "fix" it to conform to the unified scheme (e.g. vaataapi gaNapatim => vAtApi gaNapatim, and ashaindhaadum mayiloNDRu => asaindAdum mayilonDRu). Please let me know if you find this useful.
For people who havent seen this before:
The link to the unified transliteration scheme editor is http://arunk.freepgs.com/cmtranslit
The link to the scheme is http://arunk.freepgs.com/cmtranslit/cmt ... cheme.html
Any feedback is most welcome.
Thanks
Arun
I have uploaded another update. This includes the following enhancements
Enhanced Anuswara support
1. Scheme now accepts `n and `N as alternate explicit specifiers for anuswara in addition to the already existing M. These should be instead of M, in contexts when anuswara represents a non-ma sound (i.e. `n when it represents #n/~n/n, and `N when it represents N).
2. Explicit anuswara specifiers should be use only when necessary depending on the language. This means for tamizh usually never, kannada/telugu only in cases like saMyukta as similar, and for sanskrit only for words that do use anuswara.
3. For sanskrit, there are 4 choices that controls use of anuswara at end of words (that end with "m"). The default is anuswara is used for words in middle of sentence but not at the end. Note that the editor tries to figure this out automatically. From my limited testing, it seems to do a fair job. But if it misses an anuswara, you can use M to specify it explicitly. The other choices are: no anuswara (whether or not M is used at end of words), always use anuswara (for all words ending in m/M), and anuswara only for words ending in M. So if you have rAgaM tALam sa`ngItam (a hypothetical and not exactly correct example), then
(a) default would treat it like rAgaM tALaM sa`ngItam
(b) "No anuswara at word endings" would treat it like rAgam tALam sa`ngItam
(c) "Always use anuswara at word endings" would treat it like rAgaM tALaM sa`ngItaM
(d) "Use anuswara only for words ending in M" would treat it like rAgaM tALam sa`ngItam
Fix text to convert to scheme button (the new button which has a spanner/hammer.
This allows you to tell the editor make some conversions so that input text conforms to scheme, and various other changes (e.g. remove unnecessary anuswara specifiers etc.)
My intention is for people to be able to copy/paste text in other "informal" schemes and be able to easily "fix" it to conform to the unified scheme (e.g. vaataapi gaNapatim => vAtApi gaNapatim, and ashaindhaadum mayiloNDRu => asaindAdum mayilonDRu). Please let me know if you find this useful.
For people who havent seen this before:
The link to the unified transliteration scheme editor is http://arunk.freepgs.com/cmtranslit
The link to the scheme is http://arunk.freepgs.com/cmtranslit/cmt ... cheme.html
Any feedback is most welcome.
Thanks
Arun
Last edited by arunk on 09 Feb 2007, 02:35, edited 1 time in total.
-
- Posts: 1876
- Joined: 04 Feb 2010, 02:05
arunk: Here is a new bug I found in the implementation of ai when it occurs in the middle (or end) of a word. Initial ai kAras look OK. AFAIK, this has happened in the latest update you did.
Take a look at the following testcase. All scripts except Tamizh have this bug:
-------------------------------------
a A i I u U R e E ai o O au aM aH
airAvata
bhAvaikya
vInai
--------------------------------------------
अ आ इ ई उ ऊ ऱ् ऎ ए ऐ ऒ ओ औ अं अः
ऐरावत
भावइक्य
वीनइ
--------------------------------------------
అ ఆ ఇ ఈ ఉ ఊ ఱ్ ఎ ఏ ఐ ఒ ఓ ఔ అం అః
ఐరావత
భావఇక్య
వీనఇ
--------------------------------------------
ಅ ಆ ಇ ಈ ಉ ಊ ಱ್ ಎ ಏ ಐ ಒ ಓ ಔ ಅಂ ಅಃ
ಐರಾವತ
ಭಾವಇಕ್ಯ
ವೀನಇ
---------------------------------------------
அ ஆ இ ஈ உ ஊ ற் எ ஏ ஐ ஒ ஓ ஔ அம் அ:
ஐராவத
பா4வைக்ய
வீனை
----------------------------------------------
-Ramakriya
Take a look at the following testcase. All scripts except Tamizh have this bug:
-------------------------------------
a A i I u U R e E ai o O au aM aH
airAvata
bhAvaikya
vInai
--------------------------------------------
अ आ इ ई उ ऊ ऱ् ऎ ए ऐ ऒ ओ औ अं अः
ऐरावत
भावइक्य
वीनइ
--------------------------------------------
అ ఆ ఇ ఈ ఉ ఊ ఱ్ ఎ ఏ ఐ ఒ ఓ ఔ అం అః
ఐరావత
భావఇక్య
వీనఇ
--------------------------------------------
ಅ ಆ ಇ ಈ ಉ ಊ ಱ್ ಎ ಏ ಐ ಒ ಓ ಔ ಅಂ ಅಃ
ಐರಾವತ
ಭಾವಇಕ್ಯ
ವೀನಇ
---------------------------------------------
அ ஆ இ ஈ உ ஊ ற் எ ஏ ஐ ஒ ஓ ஔ அம் அ:
ஐராவத
பா4வைக்ய
வீனை
----------------------------------------------
-Ramakriya
Last edited by ramakriya on 10 Feb 2007, 02:33, edited 1 time in total.
-
- Posts: 4066
- Joined: 26 Mar 2005, 17:01
-
- Posts: 1876
- Joined: 04 Feb 2010, 02:05
Good catch - but I forgot the . after R for vowel R; However, after I tried it found another problem. Any combinations with vowel R do not show up correctly in Kannada, telugu and samskR.ta, even though stand alone vowel looks OKdrshrikaanth wrote:And I think vowel R is also not showing up correctly here. Am I correct?
Here you go!
====================
र् ऱ् ऋ
म्ऋदु क्ऋष्ण अद्ऋष्ट
-------------------------------------
r R R.
mR.du kR.ShNa adR.ShTa
------------------------------------
ర్ ఱ్ ఋ
మ్ఋదు క్ఋష్ణ అద్ఋష్ట
----------------------------------------
ರ್ ಱ್ ಋ
ಮ್ಋದು ಕ್ಋಷ್ಣ ಅದ್ಋಷ್ಟ
---------------------------------------
ர் ற் ரு2
ம்ரு2து3 க்ரு2ஷ்ண அத்3ரு2ஷ்ட
=======================
-Ramakriya
Last edited by ramakriya on 10 Feb 2007, 02:56, edited 1 time in total.
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
-
- Posts: 1876
- Joined: 04 Feb 2010, 02:05
I think that is quite true in practice. For words like R.Na, R.tu etc we can specify explicitelty.arunk wrote:btw, i was thinking whether "R" when preceded by a consonant, and also succeded by consonant (or at end of word), ,should become "R." i.e. kRshNa <=> kR.shNa for convenience. That can eliminate the need (the ugly) "." in most cases? Or is that not an unambigious case?
Arun
But doesn't that pose a problem with tamizh which uses the R (shakaTa rEpha), because words like க்ரு2ஷ்ண will start showing up with ற் instead of ர் Isnt it?
-Ramakriya
Last edited by ramakriya on 10 Feb 2007, 03:27, edited 1 time in total.
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
not sure - but in tamizh also R wont occur after a consonant AND also before one (or at end) - both conditions must be met for treating R and R.
You have saurAshTRam (vowel follows), kanRu (although we will specify as kanDRu), paRRu (vowel precedes, but again we will specify as paTRu), paRavai (vowel precedes and succeedes). I think R cannot be at end of word in tamizh
Arun
You have saurAshTRam (vowel follows), kanRu (although we will specify as kanDRu), paRRu (vowel precedes, but again we will specify as paTRu), paRavai (vowel precedes and succeedes). I think R cannot be at end of word in tamizh
Arun
Last edited by arunk on 10 Feb 2007, 03:33, edited 1 time in total.
-
- Posts: 1876
- Joined: 04 Feb 2010, 02:05
One more new bug - this time with anuswara implementation with Ta varga:
Shows up in Kannada and Telugu, and Samskr.ta
Second line in each set is with explicit anuswara specification, and that is OK.
ಕನ್ಡೆ ನಾ ಗೋವಿಂದನ
ಕಂಡೆ ನಾ ಗೋವಿಂದನ
कन्डॆ ना गोविन्दन
कण्डॆ ना गोविन्दन
kanDe nA gOvindana
kaMDe nA gOvindana
కన్డె నా గోవిందన
కండె నా గోవిందన
ಕನ್ಡೆ ನಾ ಗೋವಿಂದನ
ಕಂಡೆ ನಾ ಗೋವಿಂದನ
கன்டெ நா கோ<sup>3</sup>விந்தன
கண்டெ நா கோ<sup>3</sup>விந்தன
-Ramakriya
Shows up in Kannada and Telugu, and Samskr.ta
Second line in each set is with explicit anuswara specification, and that is OK.
ಕನ್ಡೆ ನಾ ಗೋವಿಂದನ
ಕಂಡೆ ನಾ ಗೋವಿಂದನ
कन्डॆ ना गोविन्दन
कण्डॆ ना गोविन्दन
kanDe nA gOvindana
kaMDe nA gOvindana
కన్డె నా గోవిందన
కండె నా గోవిందన
ಕನ್ಡೆ ನಾ ಗೋವಿಂದನ
ಕಂಡೆ ನಾ ಗೋವಿಂದನ
கன்டெ நா கோ<sup>3</sup>விந்தன
கண்டெ நா கோ<sup>3</sup>விந்தன
-Ramakriya
Last edited by ramakriya on 15 Feb 2007, 05:58, edited 1 time in total.
-
- Posts: 1876
- Joined: 04 Feb 2010, 02:05
If we allow words like ankusha, ungura incara, panjara instead of the explicit #n and ~n use anuswara in kannada and telugu scripts, then IMO, nTa, nDa as well should be allowed instead of the explicit N.arunk wrote:i thought it would have to be kaNDE for anuswara to use - since the nasal that is part of the pentat to which Da belongs is Na and not na (so NDa, NTa but nta, nda for anuswara to be used).
I guess not?
Arun
You can check with others what they feel about this.
-Ramakriya
Last edited by ramakriya on 15 Feb 2007, 07:26, edited 1 time in total.
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
the other form is alllowed for a few reasons:
1. There is no single letter representation possible for those nasal consonants. This is unlike for N.
2. nka, nta etc. combinations are common in use in informal use i.e. Sankara, tangam, angam, sangam, panca, pankaja, anjali etc. I dont think this is the case for T/D or atleast not with the same frequency of usage.
3 (weakest) substituting N for n is phonetically a lot more misleading than substituting n for #n/~n. It is perhaps because of reason #2 that this is so, i.e. we are used to seeing the english forms Sankara, sangam so much that when we see nga/nka/nca/nja we are able to associate what the "n" stands for.
Arun
1. There is no single letter representation possible for those nasal consonants. This is unlike for N.
2. nka, nta etc. combinations are common in use in informal use i.e. Sankara, tangam, angam, sangam, panca, pankaja, anjali etc. I dont think this is the case for T/D or atleast not with the same frequency of usage.
3 (weakest) substituting N for n is phonetically a lot more misleading than substituting n for #n/~n. It is perhaps because of reason #2 that this is so, i.e. we are used to seeing the english forms Sankara, sangam so much that when we see nga/nka/nca/nja we are able to associate what the "n" stands for.
Arun
Last edited by arunk on 15 Feb 2007, 07:34, edited 1 time in total.
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
i have a question regard the Harvard-kyoto convention as listed in the Cologne Sanskrit lexicon site (i want to provide a conversion from it in the editor):
Is this because H-K is for older form of sanskrit maybe? Or is the above out-date representation of H-K?
Thanks
Arun
I notice that R and L consonants (the R in first row is the vowel) are not listed and am wondering why. La does occur in sanskrit krithis (sakaLE, kancadaLAyadAkshi)?Cologne Digital Sanskrit Lexicon (from Monier-Williams' 'Sanskrit-English Dictionary')
The English description contains a translation, grammatical and any other information listed in the MW. You may search for all of it.
The transliteration is based on the Harvard-Kyoto (HK) convention as follows:Code: Select all
a A i I u U R RR lR lRR e ai o au M H k kh g gh G c ch j jh J T Th D Dh N t th d dh n p ph b bh m y r l v z S s h
Is this because H-K is for older form of sanskrit maybe? Or is the above out-date representation of H-K?
Thanks
Arun
-
- Posts: 1876
- Joined: 04 Feb 2010, 02:05
I think the hard 'R' sound (as in tamizh) does not exist in samskR.ta; And 'L' is an import from southern languages. Classical samskR.ta does not have this consonent and uses l instead.arunk wrote:I notice that R and L consonants (the R in first row is the vowel) are not listed and am wondering why. La does occur in sanskrit krithis (sakaLE, kancadaLAyadAkshi)?
Is this because H-K is for older form of sanskrit maybe? Or is the above out-date representation of H-K?
Thanks
Arun
-Ramakriya
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
Does anyone object if I make the following changes to the scheme itself:
1. Change qualifier numbering for tamizh ca letter. Right now 1,2,3,4,5 stand for ca, cha, sa, ja, jha. The use for "sa" sound for 3, basically makes this inconsistent with all other consonants in the 5 pentats. So instead I want to use 5 for "sa". This means 1,2,3,4: ca, cha, ja, jha (consistent with ka/kha/ga/gha), and 5 for sa.
This can affect say rendition of e.g. a construct like "asam" when using qualifier scheme "No qualifiers for hard sound".
2. Make R <=> R. if it meets two conditions (a) follow a consonant (b) and precedes a consonant.
Thanks
Arun
1. Change qualifier numbering for tamizh ca letter. Right now 1,2,3,4,5 stand for ca, cha, sa, ja, jha. The use for "sa" sound for 3, basically makes this inconsistent with all other consonants in the 5 pentats. So instead I want to use 5 for "sa". This means 1,2,3,4: ca, cha, ja, jha (consistent with ka/kha/ga/gha), and 5 for sa.
This can affect say rendition of e.g. a construct like "asam" when using qualifier scheme "No qualifiers for hard sound".
2. Make R <=> R. if it meets two conditions (a) follow a consonant (b) and precedes a consonant.
Thanks
Arun
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
jayaram/vgv
i am trying to add back support for malayalam and having some headaches because of what seems like differing behavior with fonts.
if use the zero-width joiner to force cillakshara for n, N, l, L, r, at end of words, then some fonts dont render the cillu for some, letters and others dont render for other letters, and some dont render at all . But when i copy and paste it to word, then some of the non-cillu forms become cillu forms (i think you observed something similar). Basically IE vs FireFox vs Word all exhibit slightly different behaviour (although with Firefox i think i see same as IE only that it seem to have painting problems on Firefox - particularly when i add the extra half-consonant for rna, rva etc.)
In general malayalam support on various fonts seem spotty w.r.t this - unless I am doing something wrong. It is going to be hard to come up with something reliable unless we can find out which combination (i.e. font) works best with browsers.
vgv - are you seeing anything like this? Pl. try generating for the following:
van vaN val var vaL varNa
Thanks
Arun
i am trying to add back support for malayalam and having some headaches because of what seems like differing behavior with fonts.
if use the zero-width joiner to force cillakshara for n, N, l, L, r, at end of words, then some fonts dont render the cillu for some, letters and others dont render for other letters, and some dont render at all . But when i copy and paste it to word, then some of the non-cillu forms become cillu forms (i think you observed something similar). Basically IE vs FireFox vs Word all exhibit slightly different behaviour (although with Firefox i think i see same as IE only that it seem to have painting problems on Firefox - particularly when i add the extra half-consonant for rna, rva etc.)
In general malayalam support on various fonts seem spotty w.r.t this - unless I am doing something wrong. It is going to be hard to come up with something reliable unless we can find out which combination (i.e. font) works best with browsers.
vgv - are you seeing anything like this? Pl. try generating for the following:
van vaN val var vaL varNa
Thanks
Arun
Last edited by arunk on 23 Feb 2007, 03:48, edited 1 time in total.
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
i typed in man maN mal maL maR in my editor and translated. My font is Akshar Unicode and it generated cillu for maN, mal, mar but not for man, and maL.
I copy the generated malayalam (which is unicode including the ZWJ which is "hidden") and paste it into Wordpad: All cillu's disappear (and become candrakala like). I dont know what font it picked but it is not Akshar Unicode.
I paste it to Word: Now man which wasnt cillu, becomes cillu. maN which was cillu, becomes non-cillu. mal retains cillu. maL which wasnt cillu becomes cillu. mar which was cillu looses cillu. Again it is not clear what font it picked but it isnt Akshar Unicode.
what a mess !
Arun
I copy the generated malayalam (which is unicode including the ZWJ which is "hidden") and paste it into Wordpad: All cillu's disappear (and become candrakala like). I dont know what font it picked but it is not Akshar Unicode.
I paste it to Word: Now man which wasnt cillu, becomes cillu. maN which was cillu, becomes non-cillu. mal retains cillu. maL which wasnt cillu becomes cillu. mar which was cillu looses cillu. Again it is not clear what font it picked but it isnt Akshar Unicode.
what a mess !
Arun
Last edited by arunk on 23 Feb 2007, 03:15, edited 1 time in total.