Shawalphabet YahooGroup Archive Browser
From: "Lionel Ghoti" <Lionel.Ghoti@...>
Date: 2006-05-29 12:48:54 #
Subject: New version of phpGhotiFilleter
Toggle Shavian
A new version of phpGhotiFilleter can be found at:
http://www.saytheword.org.uk/shavian/phpghotifilleter/index.php
I decided against the "super-accent" approach to transliteration that
we had discussed earlier because, if I had gone down that road, adding
words to the database would have required an unfair level of
linguistic omniscience on the part of the user: they would have been
required to have knowledge not only of their own accent, but also of
all other accents which had different phonemic patterns.
I decided to take an existing phonemic dictionary, the CMU Pronouncing
Dictionary
http://www.speech.cs.cmu.edu/cgi-bin/cmudict
(which is freely downloadable) and to store all data phonemically
rather than in Shavian words. The CMU dictionary uses a non-standard
method of phonemic notation. I converted this to SAMPA:
http://en.wikipedia.org/wiki/SAMPA_chart_for_English
When the filleter encounters a Roman word, it looks up the word in a
table of Roman and SAMPA values, then takes each SAMPA phoneme in that
word and converts it into a Shavian letter by referring to a separate
table which maps phonemes onto Shavian letters. (This table has a
separate row for each stress version of a vowel -- primary, secondary
and unstressed. The mappings might need a bit of tweaking.)
The code itself needs tweaking too, particularly with respect to the
generation of the Shavian compound letters. This is doable.
CMU appears to be in a sort of General American accent. However, it
often offers different pronunciations of the same word, in addition to
syntactically different homograph alternatives (e.g. desert(N) /
desert(V)). I think some of these should be stripped out. I aim to add
RP pronunciation data later into a new column in the same table (I've
found a separate source for RP pronunciations, but it is smaller and
is in a completely different format).
There's also a log-in function, but there is no point logging in at
the moment because it won't allow you to do anything extra: in future
it will allow you to add/modify/delete database records.
The word-add page is currently disabled pending the writing of a new
page to take into account SAMPA phonemes as opposed to Shavian letters.
So it's a bit of a work in progress at the moment. I thought it worth
releasing in its current state, for comments, to let you see what has
been happening development-wise (I've been quiet for a week or so).
Little will be done to the filleter over the next week because work
will take me away from home until Friday.
As the index page now informs you,
There are currently 129,416 rows in the roman_sampa table.
Suddenly there are very few question marks on the processing page! (It
doesn't know the word "Shavian" however -- I must get around to adding
that.)
Please send your comments.
And a happy bank holiday to any other limeys out there.
el Ghoti
From: "paul vandenbrink" <pvandenbrink11@...>
Date: 2006-05-29 21:20:21 #
Subject: Re: New version of phpGhotiFilleter
Toggle Shavian
Hi Lionel
Thank you for the new version of the Ghoti Filleter.
And you brought it out so quickly.
It nicely incorporates the Idea of 2 different major English Accents
General American and British RP.
I agree that the "super-accent" approach to transliteration is
probably a bit unwieldly at this point in the development cycle.
Essentially, you are building prototypes.
Anyway, I looked at the Shavian Output from a number of test cases
and there appears to be one particular discrepency.
It is kind of a neat discrepency, and in is own way, enlightening
about English pronunciation.
The CMU dictionary uses a much smaller number of symbol's than the
Shavian Alphabet, even taking into account General American's
restrictive use of the Shavian Phonemes.
So you can not use a one to one mapping of letters from the CMU
Roman format to Shavian.
In particular, it doesn't have the equivalent of the
Shavian R-Coloured Vowel letters, Are, Or, Ear and Air.
So they get replaced in CMU by some rather clumbersome Diagraphs.
And those Diagraph's get tranliterated into Shavian as two letters.
Not surprisingly, some people do write like that, from ignorance,
but I would really hate to institutionalize it
in the Ghoti Filleter.
Ethan and I were just criticizing this practice a few posts back.
So you get
Ah + Roar instead of Are
Awe + Roar instead of Or
Egg + Roar instead of Air
If + Roar instead of Ear.
The enlightening part is realizing how important
and prevailent the Ah and Awe sound are to English
pronunciation, especially in their R-coloured forms.
Also in some of the Diagraphs, you can see the what the form
of the real Shavian Letter should be.
Regards, Paul V.
P.S. I will look to see if there are any other such
discrepencies in the mapping of the Sampa or CMU
Roman format letters to Shavian.
_______________________attached____________________________
--- In shawalphabet@yahoogroups.com, "Lionel Ghoti"
<Lionel.Ghoti@...> wrote:
>
> A new version of phpGhotiFilleter can be found at:
>
> http://www.saytheword.org.uk/shavian/phpghotifilleter/index.php
>
> I decided against the "super-accent" approach to transliteration
that
> we had discussed earlier because, if I had gone down that road,
adding
> words to the database would have required an unfair level of
> linguistic omniscience on the part of the user: they would have been
> required to have knowledge not only of their own accent, but also of
> all other accents which had different phonemic patterns.
>
> I decided to take an existing phonemic dictionary, the CMU
Pronouncing
> Dictionary
>
> http://www.speech.cs.cmu.edu/cgi-bin/cmudict
>
> (which is freely downloadable) and to store all data phonemically
> rather than in Shavian words. The CMU dictionary uses a non-standard
> method of phonemic notation. I converted this to SAMPA:
>
> http://en.wikipedia.org/wiki/SAMPA_chart_for_English
>
> When the filleter encounters a Roman word, it looks up the word in a
> table of Roman and SAMPA values, then takes each SAMPA phoneme in
that
> word and converts it into a Shavian letter by referring to a
separate
> table which maps phonemes onto Shavian letters. (This table has a
> separate row for each stress version of a vowel -- primary,
secondary
> and unstressed. The mappings might need a bit of tweaking.)
>
> The code itself needs tweaking too, particularly with respect to the
> generation of the Shavian compound letters. This is doable.
>
> CMU appears to be in a sort of General American accent. However, it
> often offers different pronunciations of the same word, in addition
to
> syntactically different homograph alternatives (e.g. desert(N) /
> desert(V)). I think some of these should be stripped out. I aim to
add
> RP pronunciation data later into a new column in the same table
(I've
> found a separate source for RP pronunciations, but it is smaller and
> is in a completely different format).
>
> There's also a log-in function, but there is no point logging in at
> the moment because it won't allow you to do anything extra: in
future
> it will allow you to add/modify/delete database records.
>
> The word-add page is currently disabled pending the writing of a new
> page to take into account SAMPA phonemes as opposed to Shavian
letters.
>
> So it's a bit of a work in progress at the moment. I thought it
worth
> releasing in its current state, for comments, to let you see what
has
> been happening development-wise (I've been quiet for a week or so).
> Little will be done to the filleter over the next week because work
> will take me away from home until Friday.
>
> As the index page now informs you,
>
> There are currently 129,416 rows in the roman_sampa table.
>
> Suddenly there are very few question marks on the processing page!
From: "Lionel Ghoti" <Lionel.Ghoti@...>
Date: 2006-05-29 22:07:15 #
Subject: Re: New version of phpGhotiFilleter
Toggle Shavian
--- In shawalphabet@yahoogroups.com, "paul vandenbrink"
<pvandenbrink11@...> wrote:
>
> So you get
> Ah + Roar instead of Are
> Awe + Roar instead of Or
> Egg + Roar instead of Air
> If + Roar instead of Ear.
Yes, that's what I meant when I referred to "the generation of the
Shavian compound letters". It will be a very simple thing to
agglomerate such sounds into compound letters. The trick will be not
to agglomerate these pairs of sounds where they should remain
discrete. The CMU stress data should help there though. It's too late
on a Sunday evening to worry about such things now (my head hurts),
but I will address it when I return from The Smoke.
From: "paul vandenbrink" <pvandenbrink11@...>
Date: 2006-05-30 18:07:52 #
Subject: Re: New version of phpGhotiFilleter
Toggle Shavian
Hi Lionel
I agree that it should be a simple matter to collapse or consolidate
such sounds into compound letters.
The trick that you refer to about about the exceptions is actually
less difficult than you might think. First, let us consider just how
rare
the exceptions are.
Look at the 4 different cases.
1. If the Diagraph is found at the beginning of the word
2. If the Diagraph is found at the end of the word
3. If the Diagraph is found in the middle of the word,
at the end of a Syllable boundary.
4. If the Diagraph is found in the middle of the word,
overlapping a Syllable boundary.
The 4th case is the only one where we have to avoid collapsing the
letters. I was trying to find an example in the first case and
couldn't.
And the fourth case is very rare, because it is almost unknown for
the soft vowels Ah, Egg and If to end a Syllable.
Leaving aside Awe, for the moment, these vowels sound would be
morphed into a Schwa or a Long vowel, if they had to end a syllable.
The only exceptions that I can think of is Berieve or Berate, where
the long Eat sound is reduced to an If or Schwi sound in most
American accents.
As for the Awe sound ending a syllable in the middle of the word,
I just can't think of any. In American English, The Awe sound is
almost always found at the beginning of a word or in the middle of a
syllable.
Perhaps, someone else can think of one or two cases where the Awe
sound is at a syllable break.
In any case, I think it is extremely rare.
So the the trick, after all, may be simpler than you think.
Regards, Paul V.
____________________attached_________________________________
--- In shawalphabet@yahoogroups.com, "Lionel Ghoti"
<Lionel.Ghoti@...> wrote:
> > Ah + Roar instead of Are
> > Awe + Roar instead of Or
> > Egg + Roar instead of Air
> > If + Roar instead of Ear.
>
> Yes, that's what I meant when I referred to "the generation of the
> Shavian compound letters". It will be a very simple thing to
> agglomerate such sounds into compound letters. The trick will be not
> to agglomerate these pairs of sounds where they should remain
> discrete. The CMU stress data should help there though.
From: Ethan <ethanl@...>
Date: 2006-05-30 18:30:36 #
Subject: Re: [shawalphabet] Re: Some ideas for modifications to phpGhotiFilleter
Toggle Shavian
paul vandenbrink wrote:
>Hi Ethan
>I guess I wasn't quite specific enough for you.
>I was talking about all the r-colored vowel glides, that you
>mentioned (i.e. Ear/Air/Are/Or) as well as the others that are formed
>with the addition of the Array sound to a Dipthong.
>(i.e. ier, ire, ower, ayer, oyer, our, oer, oor, etc.)
>I could never imagine adding a consonant letter like Roar after the
>vowel (i.e. Up-Roar, Ado-Roar) to represent
>one of these r-colored glide sounds. I would have to add a
>rhotic or retroflex vowel letter after the Dipthong or regular vowel
>letter.
>
>As for the sounds represented by Err (urge) and Array,
>I wasn't really talking about them.
>In fact, they really don't sound different enough to me to be worth
>having 2 letters. The difference to my ear seems to be mostly that
>Err (urge) is longer and I also feel that my tongue doesn't roll up
>as much at the end of the sound or Err (urge).
>But they are definately a distinct vowel sound and not a composite or
>glide.
>Regards, Paul V.
>P.S. If I had my druthers I would use Array only for composite sounds.
>(i.e. ier, ire, ower, ayer, oyer, our, oer, oor, etc.)
>and Err (urge) for everything else.
>But in this matter, in fact, I mostly do go along with Shavian groups
>preference based on accent, in my own writing.
>Regards, Paul V.
>
>
Yeah, I see what you mean now. As for the difference between Err and
Array, I just use Err in accented sylables and Array in unaccented
ones. Other than that, I hear no difference either. In the
combinations you mention above, where no Shavian compound letter exists,
I just use the regular letter + Array. For instance, "our" I spell
/QD/, "ire" is /FD/ in Shavian.
As for being specific... it does help to be specific, not that I can't
catch it from the context, but because the context here frequently spans
multiple messages, and quite often I don't see all the messages - either
because I missed them (my fault - I'm frequently busy and just skim the
messages here) or because Yahoo for some reason decided not to deliver
that particular message to me... which happens quite often for some reason.
--
Ethan
Have you ever wondered what it'd be like to fly like a bird? Wonder no longer! www.maximumride.com
>___________________attached__________________________________
>--- In shawalphabet@yahoogroups.com, Ethan <ethanl@...> wrote:
>
>
>>>Anyway, I am just happy that Shavian does easily distinguish all
>>>
>>>
>the other r-colored vowels or rhotacized vowels. These would be
>vowels, either with the tip or blade of the tongue turned up during
>at least part of
>
>
>>>the articulation of the vowel (a retroflex articulation) or with
>>>
>>>
>the
>
>
>>>tip of the tongue down and the back of the tongue bunched. Both
>>>articulations produce basically the same auditory effect, a
>>>
>>>
>lowering
>
>
>>>in frequency at the end of the vowel, which we hear as an R-sound.
>>>After all, these sounds are only noticable as such in some of the
>>>Rhotic English accents, such as Mid-Western American English.
>>>
>>>
>>>
>>>
>Ethan said:
>
>
>>In Mid-western American speech, I don't hear the R-coloring at the
>>
>>
>end
>
>
>>of the vowel, neither do I pronounce it at the end myself, as I
>>
>>
>speak
>
>
>>with this accent, unless it's a glide like Ear/Air/Are/Or. What I
>>
>>
>hear
>
>
>>and pronounce myself where Err and Array are used as a pure rhotic
>>
>>
>or
>
>
>>retroflex vowel all the way through. That's why I don't like to
>>see
>>people replace Err with Up-Roar and Array with Ado-Roar,
>>because I don't
>>pronounce any Up or Ado sounds at the beginning.
>>
>>
From: Ethan <ethanl@...>
Date: 2006-05-30 19:08:36 #
Subject: Re: [shawalphabet] New version of phpGhotiFilleter
Toggle Shavian
Lionel Ghoti wrote:
>A new version of phpGhotiFilleter can be found at:
>
>http://www.saytheword.org.uk/shavian/phpghotifilleter/index.php
>
>I decided against the "super-accent" approach to transliteration that
>we had discussed earlier because, if I had gone down that road, adding
>words to the database would have required an unfair level of
>linguistic omniscience on the part of the user: they would have been
>required to have knowledge not only of their own accent, but also of
>all other accents which had different phonemic patterns.
>
>I decided to take an existing phonemic dictionary, the CMU Pronouncing
>Dictionary
>
>http://www.speech.cs.cmu.edu/cgi-bin/cmudict
>
>(which is freely downloadable) and to store all data phonemically
>rather than in Shavian words. The CMU dictionary uses a non-standard
>method of phonemic notation. I converted this to SAMPA:
>
>http://en.wikipedia.org/wiki/SAMPA_chart_for_English
>
>When the filleter encounters a Roman word, it looks up the word in a
>table of Roman and SAMPA values, then takes each SAMPA phoneme in that
>word and converts it into a Shavian letter by referring to a separate
>table which maps phonemes onto Shavian letters. (This table has a
>separate row for each stress version of a vowel -- primary, secondary
>and unstressed. The mappings might need a bit of tweaking.)
>
>The code itself needs tweaking too, particularly with respect to the
>generation of the Shavian compound letters. This is doable.
>
>CMU appears to be in a sort of General American accent. However, it
>often offers different pronunciations of the same word, in addition to
>syntactically different homograph alternatives (e.g. desert(N) /
>desert(V)). I think some of these should be stripped out. I aim to add
>RP pronunciation data later into a new column in the same table (I've
>found a separate source for RP pronunciations, but it is smaller and
>is in a completely different format).
>
>There's also a log-in function, but there is no point logging in at
>the moment because it won't allow you to do anything extra: in future
>it will allow you to add/modify/delete database records.
>
>The word-add page is currently disabled pending the writing of a new
>page to take into account SAMPA phonemes as opposed to Shavian letters.
>
>So it's a bit of a work in progress at the moment. I thought it worth
>releasing in its current state, for comments, to let you see what has
>been happening development-wise (I've been quiet for a week or so).
>Little will be done to the filleter over the next week because work
>will take me away from home until Friday.
>
>As the index page now informs you,
>
>There are currently 129,416 rows in the roman_sampa table.
>
>Suddenly there are very few question marks on the processing page! (It
>doesn't know the word "Shavian" however -- I must get around to adding
>that.)
>
>Please send your comments.
>
>And a happy bank holiday to any other limeys out there.
>
>el Ghoti
>
>
Thanks, LG!
I did notice something that was present in the last version, and is
still present though with slightly different effects... words with
apostrophes are mangled, because the apostrophe is treated like a word
break.
I ran this list of contractions through the fileter:
aren't you've can't you'll couldn't you'd didn't he's doesn't he'll
don't he'd she's hasn't she'll hadn't she'd isn't it's shouldn't
it'll weren't we're won't we've wouldn't we'll I'm we'd I've they've I'll
they'll I'd they'd you're
The results are:
yran'tI jM'vI kAnt jM'[?] [?]'tI jM'dI [?]'tI hI'es [?]'tI hI'[?]
dOn hI'dI SI'es [?]'tI SI'[?] [?]'tI SI'dI [?]'tI it'es [?]'tI
it'[?] [?]'tI wI'rE wOnt wI'vI [?]'tI wI'[?] F'em wI'dI F'vI HE'vI F'[?]
HE'[?] F'dI HE'dI jM'rE
--
Ethan
Have you ever wondered what it'd be like to fly like a bird? Wonder no longer! www.maximumride.com
From: Joseph Spicer <wurdbendur@...>
Date: 2006-05-31 00:17:09 #
Subject: Re: [shawalphabet] New version of phpGhotiFilleter
Toggle Shavian
I noticed that, too. I also found before that you could add words with
apostrophes to the database, so long as the Filleter didn't already
recognize either of the parts before or after.
--
Regards,
Joseph Spicer
JOsaf spFsD
Ethan wrote:
Thanks, LG!
I did notice something that was present in the last version, and is
still present though with slightly different effects... words with
apostrophes are mangled, because the apostrophe is treated like a word
break.
I ran this list of contractions through the fileter:
aren't you've can't you'll couldn't you'd didn't he's doesn't he'll
don't he'd she's hasn't she'll hadn't she'd isn't it's shouldn't
it'll weren't we're won't we've wouldn't we'll I'm we'd I've they've
I'll
they'll I'd they'd you're
The results are:
yran'tI jM'vI kAnt jM'[?]
[?]'tI jM'dI [?]'tI hI'es [?]'tI hI'[?]
dOn hI'dI SI'es [?]'tI SI'[?] [?]'tI SI'dI [?]'tI it'es [?]'tI
it'[?] [?]'tI wI'rE wOnt wI'vI [?]'tI wI'[?] F'em wI'dI F'vI HE'vI F'[?]
HE'[?] F'dI HE'dI jM'rE
--
Ethan
From: "MEng makOtO" <ljptbgx@...>
Date: 2006-05-31 01:13:21 #
Subject: hQ t AlfabetFz jMziN konsonAnts Onli:
Toggle Shavian
hQ t AlfabetFz jMziN konsonAnts Onli:
t fFnd a wurd HAt sQndz lFk "pESent"
1. sQnd it Qt.........................pE-Sunt
2. Omit H vQelz.....................p S nt
3. lUk up wot iz left............pSnt - pESent - Ebel t endur, sik
person.
From: Ethan <ethanl@...>
Date: 2006-05-31 02:00:34 #
Subject: Re: [shawalphabet] hQ t AlfabetFz jMziN konsonAnts Onli:
Toggle Shavian
MEng makOtO wrote:
> *
>
> hQ t AlfabetFz jMziN konsonAnts Onli:
>
> t fFnd a wurd HAt sQndz lFk "pESent"
>
> 1. sQnd it Qt.........................pE-Sunt
>
> 2. Omit H vQelz.....................p S nt
>
> 3. lUk up wot iz left............pSnt - pESent - Ebel t endur, sik person.
>
> *
Hello, /MEng (how exactly do you pronounce that? It looks like
oo-ayn-gh) Well, anyway, I'm not quite sure what you're trying to
explain. Need more details, please!
--
Ethan
Have you ever wondered what it'd be like to fly like a bird? Wonder no longer! www.maximumride.com
From: "paul vandenbrink" <pvandenbrink11@...>
Date: 2006-05-31 19:32:02 #
Subject: Re: Some ideas for modifications to phpGhotiFilleter
Toggle Shavian
Hi Ethan
Sorry, If I came across as being testy.
In my mind, I was just making a causual generalization, to
explain why I prefer using the Shavian r-colored vowel Letters,
rather than adding yet more Diagraphs. (i.e. Up-Roar, Ado-Roar)
For me,
I don't like unnecessary Diagraphs, because they can be
pronounced either individually as 2 discrete sounds or as one
consolidated sound.
It can potentially confuse you as to where is the syllable
boundary.
Anything that confuses syllable boundaries, takes away
from the inherent ingenuity of the Shavian Alphabet, IMHO.
I hope that answered your implied question about my generality?
Regard, Paul V.
______________attached___________________________
--- In shawalphabet@yahoogroups.com, Ethan <ethanl@...> wrote:
> >I was talking about all the r-colored vowel glides, that you
> >mentioned (i.e. Ear/Air/Are/Or) as well as the others that are
formed
> >with the addition of the Array sound to a Dipthong.
> >(i.e. ier, ire, ower, ayer, oyer, our, oer, oor, etc.)
> >I could never imagine adding a consonant letter like Roar after
the
> >vowel (i.e. Up-Roar, Ado-Roar) to represent
> >one of these r-colored glide sounds. I would have to add a
> >rhotic or retroflex vowel letter after the Dipthong or regular
vowel
> >letter.
> >
> >As for the sounds represented by Err (urge) and Array,
> >I wasn't really talking about them.
> >In fact, they really don't sound different enough to me to be
worth
> >having 2 letters. The difference to my ear seems to be mostly that
> >Err (urge) is longer and I also feel that my tongue doesn't roll
up
> >as much at the end of the sound or Err (urge).
> >But they are definately a distinct vowel sound and not a composite
or
> >glide.
> >Regards, Paul V.
> >P.S. If I had my druthers I would use Array only for composite
sounds.
> >(i.e. ier, ire, ower, ayer, oyer, our, oer, oor, etc.)
> >and Err (urge) for everything else.
> >But in this matter, in fact, I mostly do go along with Shavian
groups
> >preference based on accent, in my own writing.
> >Regards, Paul V.
> >
> >
> Yeah, I see what you mean now. As for the difference between Err
and
> Array, I just use Err in accented sylables and Array in unaccented
> ones. Other than that, I hear no difference either. In the
> combinations you mention above, where no Shavian compound letter
exists,
> I just use the regular letter + Array. For instance, "our" I spell
> /QD/, "ire" is /FD/ in Shavian.
>
> As for being specific... it does help to be specific, not that I
can't
> catch it from the context, but because the context here frequently
spans
> multiple messages, and quite often I don't see all the messages -
either
> because I missed them (my fault - I'm frequently busy and just skim
the
> messages here) or because Yahoo for some reason decided not to
deliver
> that particular message to me... which happens quite often for some
reason.
>
> --
> Ethan
>
> Have you ever wondered what it'd be like to fly like a bird?
Wonder no longer! www.maximumride.com
>
> >___________________attached__________________________________
> >--- In shawalphabet@yahoogroups.com, Ethan <ethanl@> wrote:
> >
> >
> >>>Anyway, I am just happy that Shavian does easily distinguish all
> >>>
> >>>
> >the other r-colored vowels or rhotacized vowels. These would be
> >vowels, either with the tip or blade of the tongue turned up
during
> >at least part of
> >
> >
> >>>the articulation of the vowel (a retroflex articulation) or with
> >>>
> >>>
> >the
> >
> >
> >>>tip of the tongue down and the back of the tongue bunched. Both
> >>>articulations produce basically the same auditory effect, a
> >>>
> >>>
> >lowering
> >
> >
> >>>in frequency at the end of the vowel, which we hear as an R-
sound.
> >>>After all, these sounds are only noticable as such in some of
the
> >>>Rhotic English accents, such as Mid-Western American English.
> >>>
> >>>
> >>>
> >>>
> >Ethan said:
> >
> >
> >>In Mid-western American speech, I don't hear the R-coloring at
the
> >>
> >>
> >end
> >
> >
> >>of the vowel, neither do I pronounce it at the end myself, as I
> >>
> >>
> >speak
> >
> >
> >>with this accent, unless it's a glide like Ear/Air/Are/Or. What
I
> >>
> >>
> >hear
> >
> >
> >>and pronounce myself where Err and Array are used as a pure
rhotic
> >>
> >>
> >or
> >
> >
> >>retroflex vowel all the way through. That's why I don't like to
> >>see
> >>people replace Err with Up-Roar and Array with Ado-Roar,
> >>because I don't
> >>pronounce any Up or Ado sounds at the beginning.
> >>
> >>
>