Single transliteration scheme for all CM languages?
-
- Posts: 4066
- Joined: 26 Mar 2005, 17:01
-
- Posts: 4066
- Joined: 26 Mar 2005, 17:01
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
no drs. I havent fixed the bindu stuff yet (i.e. support for M).
Regarding n# - I meant #n (:mad: at myself)
Basically the nasal consonant in the same pentad as "ga" is "#n" right? But in our scheme the combination can appear as "ng" (gangai) and not always as #ng. Thats what I meant.
Arun
Regarding n# - I meant #n (:mad: at myself)
Basically the nasal consonant in the same pentad as "ga" is "#n" right? But in our scheme the combination can appear as "ng" (gangai) and not always as #ng. Thats what I meant.
Arun
Last edited by arunk on 30 Nov 2006, 00:16, edited 1 time in total.
-
- Posts: 4066
- Joined: 26 Mar 2005, 17:01
okarunk wrote:no drs. I havent fixed the bindu stuff yet (i.e. support for M).
Regard n# - I meant #n (:mad: at myself)
YesBasically the nasal consonant in the same pentad as "ga" is "#n" right?
Im not sure I am understanding correctly. gangai will be spelt with a bindu (gaMgai) and not with a #n in kannaDa if thats what you mean. it wont be gangai ever.But in our scheme the combination can appear as "ng" (gangai) and not always as #ng. Thats what I meant.
-
- Posts: 4066
- Joined: 26 Mar 2005, 17:01
Arun you still have not explained in what context you are talking of these combinations.arunk wrote:ok. By this, I obviously misunderstood what ramakriya said.
The combinations then are:
1. #ng, #ngh, #nd, #ndh (although in our scheme these can appear as ng, ngh, nd, ndh in the input e.g. when oming from a transliteration of a tamizh krithi. For example tangam, sandi etc.)
2. ~nc, ~nch, ~nj, ~njh (some examples please?)
3. NT, NT, ND, ND
4. nt, nth, nd, ndh
5. mp, mph, mb, mbh
Arun
Anyway if you are talking of where bindu occurs,
1- #nd and #ndh are nonexistent combinations.
2- pa~nca (paMca), vA~nchalya (vAMchalya), aMu, jhaMjhAvAta
3- You probably meant NT, NTh, ND, NDh
4 & 5 are fine
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
I didnt mean the kannaDa spelling - it would be with a M of course. I meant the transliteration text/input itself. For example, even if the input text you enter is amba, it becomes aMba. Same way whether the input is gan ga, or inta, they become gaMga, iMta right? It is "n" that comes with "ga" and not "#n"drshrikaanth wrote:Im not sure I am understanding correctly. gangai will be spelt with a bindu (gaMgai) and not with a #n in kannaDa if thats what you mean. it wont be gangai ever.
Arun
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
I have put in the changes but not yet uploaded. The changes are:
1. Added support for n2 for tamizh ந besides the existing ^n
2. Incorporated kannada rules for when we know for sure that a nasal consonant eithet becomes bindu or not.
3. Removed #ng as <=> #n in cases like vAngmaya
Here is what I get. Please check:
tirun2Amam திருநாமம் ತಿರುಮಾಮಂ
tiru^nAmam திருநாமம் ತಿರುಮಾಮಂ
vA#nmaya வாங்மய ವಾಙ್ಮಯ
a#n#nanam அங்ஙனம் ಅಙ್ಙಮಂ
vAngAdE வாங்காதே ವಾಂಗಾದೇ
gangA கங்கா ಗಂಗಾ
ga#ngA கங்கா ಗಂಗಾ
panca பஞ்ச ಪಂಚ
pa~nca பஞ்ச ಪಂಚ
panjam பஞ்சம் ಪಂಜಂ
pa~nja பஞ்ஜ ಪಂಜ (this i know is wrong for tamizh, i will fix it)
pankajam பங்கஜம் ಪಂಕಜಂ
samyukta, saMyukta சம்யுக்த, சம்யுக்த ಸಮ್ಯುಕ್ತ, ಸಂಯುಕ್ತ
imsai இம்சை ಇಂಸೈ
amba அம்ப ಅಂಬ
ammA அம்மா ಅಮ್ಮಾ
annam அன்னம் ಅಮ್ಮಂ
These are not all the use-cases but should give you an idea whether i am already off or not.
ramakriya: k + sha => ksha. For tamizh the system does it (like sri). I havent looked at this yet to fix for kannada.
Thanks again for all of your patience and valuable input
Arun
1. Added support for n2 for tamizh ந besides the existing ^n
2. Incorporated kannada rules for when we know for sure that a nasal consonant eithet becomes bindu or not.
3. Removed #ng as <=> #n in cases like vAngmaya
Here is what I get. Please check:
tirun2Amam திருநாமம் ತಿರುಮಾಮಂ
tiru^nAmam திருநாமம் ತಿರುಮಾಮಂ
vA#nmaya வாங்மய ವಾಙ್ಮಯ
a#n#nanam அங்ஙனம் ಅಙ್ಙಮಂ
vAngAdE வாங்காதே ವಾಂಗಾದೇ
gangA கங்கா ಗಂಗಾ
ga#ngA கங்கா ಗಂಗಾ
panca பஞ்ச ಪಂಚ
pa~nca பஞ்ச ಪಂಚ
panjam பஞ்சம் ಪಂಜಂ
pa~nja பஞ்ஜ ಪಂಜ (this i know is wrong for tamizh, i will fix it)
pankajam பங்கஜம் ಪಂಕಜಂ
samyukta, saMyukta சம்யுக்த, சம்யுக்த ಸಮ್ಯುಕ್ತ, ಸಂಯುಕ್ತ
imsai இம்சை ಇಂಸೈ
amba அம்ப ಅಂಬ
ammA அம்மா ಅಮ್ಮಾ
annam அன்னம் ಅಮ್ಮಂ
These are not all the use-cases but should give you an idea whether i am already off or not.
ramakriya: k + sha => ksha. For tamizh the system does it (like sri). I havent looked at this yet to fix for kannada.
Thanks again for all of your patience and valuable input
Arun
Last edited by arunk on 30 Nov 2006, 01:34, edited 1 time in total.
-
- Posts: 4066
- Joined: 26 Mar 2005, 17:01
Watch out. All "n" coming out as m in kannaDa (tirumAmam, ammam etc). Fix it.
imsai இம்சை ಇಂಸೈ- sounds hottible but is correct
ಸಮ್ಯುಕ್ತ - wrong, ಸಂಯುಕ್ತ- correctarunk wrote:samyukta, saMyukta சம்யுக்த, சம்யுக்த ಸಮ್ಯುಕ್ತ, ಸಂಯುಕ್ತ
imsai இம்சை ಇಂಸೈ- sounds hottible but is correct

I think Sh and sh are both being treated as the letter in SyAmA while S is being treated as dIkShitar. Please reverse it. But kSa is coming out as that in dIkShitar. right.ramakriya: k + sha => ksha. For tamizh the system does it (like sri). I havent looked at this yet to fix for kannada.
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
That is a bug because of a typo. I was incorrectly generating the code for "ma" (an remnant of old logic)drshrikaanth wrote:Watch out. All "n" coming out as m in kannaDa (tirumAmam, ammam etc). Fix it.

First one is wrong because transliterated text is wrong - as in samyutka is wrong, and it should only be saMyukta or is it wrong for other reasons?drshrikaanth wrote:ಸಮ್ಯುಕ್ತ - wrong, ಸಂಯುಕ್ತ- correctarunk wrote:samyukta, saMyukta சம்யுக்த, சம்யுக்த ಸಮ್ಯುಕ್ತ, ಸಂಯುಕ್ತ
Thanks. I will check this.drshrikaanth wrote:I think Sh and sh are both being treated as the letter in SyAmA while S is being treated as dIkShitar. Please reverse it. But kSa is coming out as that in dIkShitar. right.
Arun
Last edited by arunk on 30 Nov 2006, 01:53, edited 1 time in total.
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
how about this:
sa, Sa, Sha, SyAmA, ShAmaLA, sAma:
ಸ, ಶ, ಷ, ಶ್ಯಾಮಾ, ಷಾಮಳಾ, ಸಾಮ
dIkshitar vs dIkSitar (which is correct as i think my change affected this?)
ದೀಕ್ಷಿತರ್, ದೀಕ್ಶಿತರ್
Also with my change:
SrI:
ஸ்ரீ, ಶ್ರೀ : right?
Arun
sa, Sa, Sha, SyAmA, ShAmaLA, sAma:
ಸ, ಶ, ಷ, ಶ್ಯಾಮಾ, ಷಾಮಳಾ, ಸಾಮ
dIkshitar vs dIkSitar (which is correct as i think my change affected this?)
ದೀಕ್ಷಿತರ್, ದೀಕ್ಶಿತರ್
Also with my change:
SrI:
ஸ்ரீ, ಶ್ರೀ : right?
Arun
Last edited by arunk on 30 Nov 2006, 02:04, edited 1 time in total.
-
- Posts: 4066
- Joined: 26 Mar 2005, 17:01
It is not spelt that way. Not even in sanskrit. The second way is correct way of spelling-with a binduarunk wrote:First one is wrong because transliterated text is wrong - as in samyutka is wrong, and it should only be saMyukta or is it wrong for other reasons?drshrikaanth wrote:ಸಮ್ಯುಕ್ತ - wrong, ಸಂಯುಕ್ತ- correctarunk wrote:samyukta, saMyukta சம்யுக்த, சம்யுக்த ಸಮ್ಯುಕ್ತ, ಸಂಯುಕ್ತ
-
- Posts: 4066
- Joined: 26 Mar 2005, 17:01
ShAmaLA, ಷಾಮಳಾ- wrong. It should be the other Sa ("Sankhada Sa" as it is called in kannaDa as it is shaped like a conch). Actually it should be SyAmaLa
first one is correct both in english and kannaDa.dIkshitar vs dIkSitar (which is correct as i think my change affected this?)
ದೀಕ್ಷಿತರ್, ದೀಕ್ಶಿತರ್
correct.Also with my change:
SrI:
ஸ்ரீ, ಶ್ರೀ : right?
Arun
-
- Posts: 4066
- Joined: 26 Mar 2005, 17:01
-
- Posts: 4066
- Joined: 26 Mar 2005, 17:01
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
Ok Thanks. I was thinking about tamizh words only - the other ones i was just taking (bad) guesses.drshrikaanth wrote:FYI In kannaDa words are never spelt with a dIrgha at the end. Not even names. so you have only akka, amma, saumya, ramya not akkA, ammA, saumyA, ramyA. In tamizh they are spelt in the latter fashion.
Arun
-
- Posts: 4066
- Joined: 26 Mar 2005, 17:01
-
- Posts: 4066
- Joined: 26 Mar 2005, 17:01
Arun. You have made sure that what you write in tamizh comes out correctly in kannaDa even with some flexibility in english spellings. But the reverse is a problem. saMgama comes out all awkward in tamizh சம்கம. What are ou going to do about it?
BTW I sent you an email earlier today. Did you see it?
BTW I sent you an email earlier today. Did you see it?
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
yes intentional. With m, i am morphing to bindu only if followed by p, ph, b, bh and s (what about S, Sh? i am not now). The p, ph, b, bh is because of the home pentat. The "sa" is for tamizh (and we know it doesnt morph in kannada).
I can do for others. Please tell me which ones are safe and i can easily add them.
Arun
I can do for others. Please tell me which ones are safe and i can easily add them.
Arun
-
- Posts: 4066
- Joined: 26 Mar 2005, 17:01
I think you can safely make "m" as bindu for v, S, Sh, s and h in kannaDa. (Will let you know if I recall exceptions)arunk wrote:yes intentional. With m, i am morphing to bindu only if followed by p, ph, b, bh and s (what about S, Sh? i am not now). The p, ph, b, bh is because of the home pentat. The "sa" is for tamizh (and we know it doesnt morph in kannada).
I can do for others. Please tell me which ones are safe and i can easily add them.
Arun
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
Oh yes! I forgot. I have to add some smarts to the tamizh logic. I think this is possible as follows:drshrikaanth wrote:Arun. You have made sure that what you write in tamizh comes out correctly in kannaDa even with some flexibility in english spellings. But the reverse is a problem. saMgama comes out all awkward in tamizh சம்கம. What are ou going to do about it?
Md(h), Mt(h), Mc(h), Mj(h), Mk(h), Mg(h) => Nd(h), Nt(h) and so on. That would work right?
Sorry havent checked email.BTW I sent you an email earlier today. Did you see it?
-
- Posts: 4066
- Joined: 26 Mar 2005, 17:01
Mt(h), Md(h)= nd, not Nd(h),. And the rest. I guess you made a typo but I want to make sure you dont make an error in the script.arunk wrote:Oh yes! I forgot. I have to add some smarts to the tamizh logic. I think this is possible as follows:
Md(h), Mt(h), Mc(h), Mj(h), Mk(h), Mg(h) => Nd(h), Nt(h) and so on. That would work right?
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
-
- Posts: 1317
- Joined: 30 Jun 2006, 03:08
That's an interesting statement. The ~n sound is common in Malayalam (as in '~nAn' for 'I'). Based on the above, this word would get pronounced as 'j~nAn' by a Kannadiga! Shouldn't we try and capture the original sound to the extent possible, so that at least a multi-lingual person can pronounce it closest to the original - rather than cater just for the unilingual person? (Also, several musicians tend to be multi-lingual any way.)drshrikaanth wrote:I think you are safer writing it as j~nAna in kannaDa as that is how kannaDigas will pronounce it even in a tamizh kRti. The problem with lwriting it as ~nAnam is that they will probably have problems recognising the sound as words never begin with ~n in kannaDa.
-
- Posts: 4066
- Joined: 26 Mar 2005, 17:01
2 things to consider here Jayaram. One ~nAnam is immediately recognisable as j~nAna and the familiar pronunciation takes over. Two, how many malayalam songs do you get to hear from musicians? If more were in circulation, then these sounds may be produced more accurately by non-native speakers. Just as an exampl, when I spoke malayalam, native speakers always used to chide me for saying pare~n~nu although I was oly saying para~n~nu. These things do crrep in without ones knowledge and despite the best conscious efforts.jayaram wrote:That's an interesting statement. The ~n sound is common in Malayalam (as in '~nAn' for 'I'). Based on the above, this word would get pronounced as 'j~nAn' by a Kannadiga! Shouldn't we try and capture the original sound to the extent possible, so that at least a multi-lingual person can pronounce it closest to the original - rather than cater just for the unilingual person? (Also, several musicians tend to be multi-lingual any way.)
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
i have a proposal:
How about "ei", "Ai" etc. to mean "e" + y and "A" + y. That way மெய்/தாய்மொழி can also be mei/tAimozhi besides mey/tAymozhi. In this case the mey etc. seems too far off phonetically.
I am thinking of this rule: If an "i" occurs in the middle of the word but is not part of a consonant modifier, then it is like an "y". For example, not as in mi or mai (since i and ai are consonant modifiers), but only as mei, mAi, moi etc. I would recommend judicous use of it as this would allow something like seiiuL
. We could prevent this by saying it must be end of word of must be followed by a consonant?
Thoughts?
Arun
How about "ei", "Ai" etc. to mean "e" + y and "A" + y. That way மெய்/தாய்மொழி can also be mei/tAimozhi besides mey/tAymozhi. In this case the mey etc. seems too far off phonetically.
I am thinking of this rule: If an "i" occurs in the middle of the word but is not part of a consonant modifier, then it is like an "y". For example, not as in mi or mai (since i and ai are consonant modifiers), but only as mei, mAi, moi etc. I would recommend judicous use of it as this would allow something like seiiuL

Thoughts?
Arun
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
the other possibility is to render it as j~nAn but put qualifiers like subscript. This is not possible now, but eventually it will be. Each language specific logic, will know what the "original language" and be able to adjust their logic if necessary
This was my thinking all along.
Arun
This was my thinking all along.
Arun
Last edited by arunk on 30 Nov 2006, 04:24, edited 1 time in total.
-
- Posts: 1317
- Joined: 30 Jun 2006, 03:08
I don't quite agree. To give an analogy, English has the 'fricative' sound, e.g. the word 'the' or 'this'. The 'th' sound is not like any of the Indic sounds, but a sensitive listener can pick it up when spoken by a native English speaker and learn to reproduce it after a few tries. The key here, of course, is sensitive listener.drshrikaanth wrote:2 things to consider here Jayaram. One ~nAnam is immediately recognisable as j~nAna and the familiar pronunciation takes over. Two, how many malayalam songs do you get to hear from musicians? If more were in circulation, then these sounds may be produced more accurately by non-native speakers. Just as an exampl, when I spoke malayalam, native speakers always used to chide me for saying pare~n~nu although I was oly saying para~n~nu. These things do crrep in without ones knowledge and despite the best conscious efforts.jayaram wrote:That's an interesting statement. The ~n sound is common in Malayalam (as in '~nAn' for 'I'). Based on the above, this word would get pronounced as 'j~nAn' by a Kannadiga! Shouldn't we try and capture the original sound to the extent possible, so that at least a multi-lingual person can pronounce it closest to the original - rather than cater just for the unilingual person? (Also, several musicians tend to be multi-lingual any way.)
(Another example: some people can't pronounce the 'zh' in 'pazhaM' - instead they say 'paLaM'. Should we then do away with the 'zh' sound altogether, to cater for these folks?)
Your point about not many Malayalam songs being rendered by musicians is a bit tangential to this discussion. It is well known that fewer CM kritis exist in the language and Malayalam is a bit of a loner amongst the 4 south indian languages & gets the stepsisterly treatment anyway.
Having said all this, I thought Arun's attempt was to come up with a robust transliterator for the 5 languages, or are we not to cover Malayalam fully?
Last edited by jayaram on 30 Nov 2006, 05:07, edited 1 time in total.
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
jayaram,
the goal is to come with a transliterator for all 5 languages incl. malayALam. But what is in question is here is whether we can get proper pronounciation of any language preserved/communicated in ALL 4 languages. Doing that without ambiguity or additional interpretation is not practicallly possible (so we do the best we can - as explained below).
It is not fully possible because e.g. we cannot represent the sound of that first "na" in nanni in tamizh or kannada as no alphabet in those languages carry that sound. Now the case of ~na here may be a better match (but i will wait for drs's response as to whether a ~nA without j preceding it would have different contextual pronounciation implications). Where there are no good matches, you can only hope for approximation wherever possible. In some cases you get reasonable ones, and in some cases trying to "fa" sound (occurs in hindi (corrected)) in tamizh you pretty much fall flat (people use aHk + p, that is a very very gross approximation).
Anyway in all such cases, you either accept the approximation as good enough to need no other qualifiers, or you qualify with super-scripts. But all these require additional interpretation and so the user must be sensitive to get it right. If you want pronounciation right, you must be sensitive
. That is a given. If one cares about pronounciation and one knows it is a malayALam krithi, then one would try to get that ~nAn right no matter how it appears in that person's favorite language's script. But to atleast clue him on these "not-native" pronounciation, we put those qualifiers like say k1 (ka), 2 (kha), etc. (say in tamizh script for a non-tamizh krithi).
A rough example: We all see pArvati in transl. english and we know what it stands for even if we dont have it written in our language (because we know the scheme). But tell me if the english word "pArvati" phonetically close to the actual Indic word?
Regarding pazham and paLam - that is not a correct example. The "zha" alphabet very much exists (and besides there are many people who get it right
).
Arun
the goal is to come with a transliterator for all 5 languages incl. malayALam. But what is in question is here is whether we can get proper pronounciation of any language preserved/communicated in ALL 4 languages. Doing that without ambiguity or additional interpretation is not practicallly possible (so we do the best we can - as explained below).
It is not fully possible because e.g. we cannot represent the sound of that first "na" in nanni in tamizh or kannada as no alphabet in those languages carry that sound. Now the case of ~na here may be a better match (but i will wait for drs's response as to whether a ~nA without j preceding it would have different contextual pronounciation implications). Where there are no good matches, you can only hope for approximation wherever possible. In some cases you get reasonable ones, and in some cases trying to "fa" sound (occurs in hindi (corrected)) in tamizh you pretty much fall flat (people use aHk + p, that is a very very gross approximation).
Anyway in all such cases, you either accept the approximation as good enough to need no other qualifiers, or you qualify with super-scripts. But all these require additional interpretation and so the user must be sensitive to get it right. If you want pronounciation right, you must be sensitive

A rough example: We all see pArvati in transl. english and we know what it stands for even if we dont have it written in our language (because we know the scheme). But tell me if the english word "pArvati" phonetically close to the actual Indic word?
Regarding pazham and paLam - that is not a correct example. The "zha" alphabet very much exists (and besides there are many people who get it right

Arun
Last edited by arunk on 30 Nov 2006, 21:14, edited 1 time in total.
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
Drs. I was thinking about this. Some pointsdrshrikaanth wrote:Arun. You have made sure that what you write in tamizh comes out correctly in kannaDa even with some flexibility in english spellings. But the reverse is a problem. saMgama comes out all awkward in tamizh ?????. What are ou going to do about it?
1. The anuswara does not affect pronounciation and hence is more of a characteristic of the script.
2. The use of explicit M in some cases (where it represents non-ma sound) makes the transl. text less phonetically equivalent to the Indic word being referred.
3. Because of #2, it can mislead some people who dont know the bindu rules (i guess mostly tamilians

Based on this, wouldnt it be better if we want to recommmend against using 'M', except in places where it is absolutely necessary (and of course there it is replacing 'm' and so phonetic nature of transliterated text doesnt change)?
The program will still accept M in all applicable cases (and tamizh will come out right) but i thought just like #n, you want to use it only where it is necessary. What do you think?
Arun
Last edited by arunk on 30 Nov 2006, 06:19, edited 1 time in total.
-
- Posts: 1877
- Joined: 04 Feb 2010, 02:05
If it was a malayALam kriti with the word ~nAn, it should be written in kannada as ~nAn as well ; Readers should be able to figure it out. It certainley should not be written as j~nAn.arunk wrote:drs,
If a ~nAn from a malayAlam krithi is rendered in kannada as just ~nAn would it be bad (it may look horrible as imsai but wouldnt it help in better pronounciation?).
Arun
-Ramakriya
Last edited by ramakriya on 30 Nov 2006, 11:50, edited 1 time in total.
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
also my point about "fa" is wrong. I was thinking Hindi and said sanskrit. I dont know if sanskrit uses it (only pha?). Of course hindi right now doest figure here - but the point i was making was there may be certain sounds in one language, which are impossible to represent in any adequate fashion in another language.
-
- Posts: 1317
- Joined: 30 Jun 2006, 03:08
Which is why we need a robust transliteration scheme in the first place! At the risk of sounding like a broker record, lemme give an example: the Hindi word 'nahin' (for 'no') has the peculiar sound 'hin' which is not found in, say, the 4 south indian languages. How then do you transliterate a Swati kriti in Hindi that uses this word? Do you not require a scheme to represent this? The alternative is to agree that we would only use the common sounds, but that straightaway eliminates the kha, gha etc. that don't occur in Tamil!but the point i was making was there may be certain sounds in one language, which are impossible to represent in any adequate fashion in another language.
Tricky, I know, but that's why we need brains like yours!
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
jayaram,
you are missing the point. The sounds themselves can never be communicated without additional interpretation unless:
(a) you extend the script to introduce unique symbols for different sounds.
(b) you add extra qualifiers (like k1, k2).
Now you would think that (a) is needed only for "foreign sounds" but NO. A language's script is NOT ALWAYS fully representative of the language's OWN sounds - forget other ones. Not all scripts are phonetic (tamizh and english being prime examples) - and even when they are, they are in varying degrees.
We should not be and are not trying to extend scripts here. That is over-extending our job and it needs wider support etc. The most we will do is (b) i.e. add qualifiers.
Arun
you are missing the point. The sounds themselves can never be communicated without additional interpretation unless:
(a) you extend the script to introduce unique symbols for different sounds.
(b) you add extra qualifiers (like k1, k2).
Now you would think that (a) is needed only for "foreign sounds" but NO. A language's script is NOT ALWAYS fully representative of the language's OWN sounds - forget other ones. Not all scripts are phonetic (tamizh and english being prime examples) - and even when they are, they are in varying degrees.
We should not be and are not trying to extend scripts here. That is over-extending our job and it needs wider support etc. The most we will do is (b) i.e. add qualifiers.
Arun
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
the transl. scheme itself on the other hand can use as many symbols as needed for varying unique sounds in all 5 languages e.g. #n, ~n etc..
But just as even here we are not extending english alphabet, but are merely adding special qualifiers to "n", when the transliteration scheme gets rendered into the 5 indic languages, they will add (different kinds) of qualifiers.
Arun
But just as even here we are not extending english alphabet, but are merely adding special qualifiers to "n", when the transliteration scheme gets rendered into the 5 indic languages, they will add (different kinds) of qualifiers.
Arun
-
- Posts: 3424
- Joined: 07 Feb 2010, 21:41
Its possible as some sort of a "legend" . But for the program to take a transl. text and do "text to speech" and get the sounds right .. Now wouldnt that be awesomejayaram wrote:Agreed we can't achieve 100% perfection, but at least we can aim for the max possible.
By the way, will your scheme also provide audio representations of the written characters? I mean, can we click a button and hear the sounds themselves?


Arun
Last edited by arunk on 30 Nov 2006, 21:54, edited 1 time in total.
-
- Posts: 10958
- Joined: 03 Feb 2010, 00:01
There are some commercial packages that do a reasonably good job but they still sound mechanical and robotic. There are some R&D projects I am aware of where they attempt to faithfully reproduce the sounds. Those are not from a transliterated text but based on a lot more complicated and non-human understandeable ( but still textual ) representation. The complexity of that representation is increased a lot since the intonation and stress depends on the syntactical structure of the sentence and not just individual words or phoneme relationships.But for the program to take a transl. text and do "text to speech" and get the sounds right
Just a related quiz.. How many phonemes exist in English? How about our 5 languages we are talking about. I know the answer to the first one, I will reveal a bit later!! ( I heard there is an african language with 100 phonemes!! )
-
- Posts: 1430
- Joined: 13 Aug 2006, 10:51
arunk,
In any case, IMHO Sanskrit is the only language in the World which caters for WYSIWYG concept.
From what I have understood by perusing the unicode allotment scheme, Devanagi is the superset of all Indian Languages which caters for all sounds used in Indian Languages and Urdu (Urdu also an Indian Language, of course). In my opinion, it caters for even foreign language sounds like 'zha' in Russian Language similar to Tamil 'ழ'.I dont know if sanskrit uses it (only pha?)
In any case, IMHO Sanskrit is the only language in the World which caters for WYSIWYG concept.