YahooGroup Archive

From: "Ethan" <ethanl@...>
Date: 2006-05-17 23:38:51 #
Subject: Re: Some ideas for modifications to phpGhotiFilleter

Toggle Shavian

--- In shawalphabet@yahoogroups.com, "Lionel Ghoti" <Lionel.Ghoti@...>
wrote:
>
> Hi Philip,
>
> Yes, Shavian is phonemic, but the Traditional Orthography isn't, and
> that's what we're starting with. If the program is to work on a
> phonemic level rather than a word level, then before it even gets
> around to transliterating anything it has to determine what the
> phonemes in each TO word are, and that would require a LOT more
> sophistication than the current program has -- and even then there
> would be a lot of room for error due to orthographic idiosyncrasies.
> It would be great to take that approach, but I don't think it's viable
> given the complexity of the English language, my programming ability
> and the amount of time I'm willing to expend on the project.
>
> If you're suggesting that the program should contain a table of
> underlying phonemic values for each TO word, and then that it should
> contain a table of how those underlying phonemic values map onto
> phonemes for each target accent, I think that would be very hard to
> do. Given the different phonemic patterns that different accents have,
> we'd be dealing with meta-phonemics rather than pure, simple
> phonemics. I could see a lot going wrong (and who would provide the
> data?). (But I haven't thought about that for long enough yet. Please
> let me know if you think I've got the wrong end of the stick.)
>
> If someone doing a transliteration were asked to define the
> characteristics of their accent (Do "put" and "putt" rhyme? Do "crass"
> and "grass" rhyme? and so on...) I think that would be asking too
> much. Easier to say, "Where, broadly, do you come from?"
>
> If we work at the level of words rather than phonemes, then we will
> have a larger database, but, if we have a well-populated database, we
> will be able to produce a perfect transliteration every time (assuming
> that all homographs have been added and that the user selects each one
> accurately), because the really intelligent work will have been done
> by the people who created the database records.
>
> I have to think more about this, but the word-by-word approach feels
> right to me, because it is simple and simple is usually good. This is
> the approach that was taken by the earlier, PC-based version of
> Ghoti-Fingers. I could always rely on it to produce an accurate piece
> of text as long as all homographs were accounted for, but I don't
> think I could expect the same of a more sophisticated (and
> error-prone) method.
>
> LG

Hello, Lionel. Yes, I tend to agree with you that doing accent
patterns would be too difficult and complex. I would favor a simple
set that most people agree upon - i.e. General American, RP or
Northern English I believe it is (such as in "Androcles"), perhaps
Australian and South African, etc. Initially I would focus on two: GA
and Androcles, at least. I'm curious to know what other non-American
members think should be included.

Another thing I noticed is the size of the database entry page. I can
see how that in a short while, that page is going to become
unmanagable due to length. Perhaps it could be divided into multiple
pages by first letter, for instance.

These are exactly the kind of things that become evident when you put
something like this to the test! Overall, this looks like a very
useful tool, and I'm rather excited to see where it goes!

I understand about not releasing this as open source yet, because as a
programmer, I know how messy initial code can be! I just wanted to
mention it, since I think it would be a good goal for it to be open
sourced when it becomes more stable.

From: Joseph Spicer <wurdbendur@...>
Date: 2006-05-18 01:04:13 #
Subject: Re: [shawalphabet] Re: Some ideas for modifications to phpGhotiFilleter

Toggle Shavian

I was thinking more of having a set of broad dialects that would be
collections of phonemic features. These variations could be stored in
the database as Philip suggested, and the user would simply choose a
dialect. The Filleter would then choose the transliterations according
to the features in the selected dialect. But that may be a difficult
thing to program.

I was toying with some ideas of how to store these variations in the
database. It's possible to consolidate some of the entries by
introducing extra characters for those phones whose phonemic status
varies between dialects. For example, a new symbol could be added for
the sound that is sometimes a schwa and sometimes a schwi. The database
may have a record for "city" that gives "sitK" (just to pick something
that isn't used for anything else), and the Filleter could be made to
resolve which spelling to give (sitI or siti) based on the user's dialect selection. It
seems to me that the substitution is the easy part, but it would
require some extra work to manage the database.

Regards,

Joseph Spicer

JOsaf spFsD

Lionel Ghoti wrote:

Hi Philip,

Yes, Shavian is phonemic, but the Traditional Orthography isn't, and
that's what we're starting with. If the program is to work on a
phonemic level rather than a word level, then before it even gets
around to transliterating anything it has to determine what the
phonemes in each TO word are, and that would require a LOT more
sophistication than the current program has -- and even then there
would be a lot of room for error due to orthographic idiosyncrasies.
It would be great to take that approach, but I don't think it's viable
given the complexity of the English language, my programming ability
and the amount of time I'm willing to expend on the project.

If you're suggesting that the program should contain a table of
underlying phonemic values for each TO word, and then that it should
contain a table of how those underlying phonemic values map onto
phonemes for each target accent, I think that would be very hard to
do. Given the different phonemic patterns that different accents have,
we'd be dealing with meta-phonemics rather than pure, simple
phonemics. I could see a lot going wrong (and who would provide the
data?). (But I haven't thought about that for long enough yet. Please
let me know if you think I've got the wrong end of the stick.)

If someone doing a transliteration were asked to define the
characteristics of their accent (Do "put" and "putt" rhyme? Do "crass"
and "grass" rhyme? and so on...) I think that would be asking too
much. Easier to say, "Where, broadly, do you come from?"

If we work at the level of words rather than phonemes, then we will
have a larger database, but, if we have a well-populated database, we
will be able to produce a perfect transliteration every time (assuming
that all homographs have been added and that the user selects each one
accurately), because the really intelligent work will have been done
by the people who created the database records.

I have to think more about this, but the word-by-word approach feels
right to me, because it is simple and simple is usually good. This is
the approach that was taken by the earlier, PC-based version of
Ghoti-Fingers. I could always rely on it to produce an accurate piece
of text as long as all homographs were accounted for, but I don't
think I could expect the same of a more sophisticated (and
error-prone) method.

LG

From: "Lionel Ghoti" <Lionel.Ghoti@...>
Date: 2006-05-18 12:59:38 #
Subject: Re: Some ideas for modifications to phpGhotiFilleter

Toggle Shavian

I had another think about the accents problem this morning in the
shower (the best place to think), and I think I now agree with Philip
and Joseph: it would be best to store word-pairs phonetically for
one "super-accent", and then to map the super-phonemes into actual
phonemes for the required accent. I now don't think this would require
such a lot of extra processing because most of the work has already
been done for us by Kingsley Read. We would just need a few extra
symbols for sounds like word-final /i/ and "wh".

Adding new word-pairs would require a little more skill and knowledge
of phonetics than the existing process, but all new database entries
could be listed on a special page for peer review.

I intend to look at this at the weekend. So please don't add anything
else to the database because everything in it will have to be reviewed
in more detail than I had originally thought.

LG

From: Star Raven <celestraof12worlds@...>
Date: 2006-05-18 13:19:08 #
Subject: Re: [shawalphabet] Re: Some ideas for modifications to phpGhotiFilleter

Toggle Shavian

I usually do my thinking in the car or in bed before I go to sleep, and
I use the shower for singing. Anyway, I'm not sure what you mean by
final /i/.

--- Lionel Ghoti <Lionel.Ghoti@...> wrote:

> I had another think about the accents problem this morning in the
> shower (the best place to think), and I think I now agree with Philip
>
> and Joseph: it would be best to store word-pairs phonetically for
> one "super-accent", and then to map the super-phonemes into actual
> phonemes for the required accent. I now don't think this would
> require
> such a lot of extra processing because most of the work has already
> been done for us by Kingsley Read. We would just need a few extra
> symbols for sounds like word-final /i/ and "wh".
>
> Adding new word-pairs would require a little more skill and knowledge
>
> of phonetics than the existing process, but all new database entries
> could be listed on a special page for peer review.
>
> I intend to look at this at the weekend. So please don't add anything
>
> else to the database because everything in it will have to be
> reviewed
> in more detail than I had originally thought.
>
> LG
>
>
>
>

=========
http://www.livejournal.com/users/wodentoad

An idle duck is the devil's playground.

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

From: "Philip Newton" <philip.newton@...>
Date: 2006-05-18 13:24:53 #
Subject: Re: [shawalphabet] Re: Some ideas for modifications to phpGhotiFilleter

Toggle Shavian

On 5/18/06, Lionel Ghoti <Lionel.Ghoti@...> wrote:
> it would be best to store word-pairs phonetically for
> one "super-accent", and then to map the super-phonemes into actual
> phonemes for the required accent. I now don't think this would require
> such a lot of extra processing because most of the work has already
> been done for us by Kingsley Read. We would just need a few extra
> symbols for sounds like word-final /i/ and "wh".

You may want to have a look at the Wells's Standard Lexical Sets for
English: http://en.wikipedia.org/wiki/Lexical_set#Standard_Lexical_Sets_for_English
for a list of differences that are meaningful in a wide range (but not
all) of English dialects.

Those sets concern themselves more with vowels, but they let you
differentiate, for example, PALM (long "ah" for most people) - BATH
(long "ah" for some, short "ash" for others) - TRAP (short "ash" for
most people. Then the PALM sound could be represented with "ah" and
the TRAP sound with "ash", while for BATH you'd need a third sound,
which would then be mapped either onto "ah" or onto "ash", depending
on the accent. Similarly, perhaps, for LOT ("on" for most?) - CLOTH
("on" for some, "awe" for others) - THOUGHT ("awe" for most?). And, if
you wish, for NORTH - FORCE (probably "or" for the first set and a new
symbol for the second, which gets mapped either to "or" or to "oak" +
"array")

Cheers,
--
Philip Newton <philip.newton@...>

From: "Philip Newton" <philip.newton@...>
Date: 2006-05-18 13:25:08 #
Subject: Re: [shawalphabet] Re: Some ideas for modifications to phpGhotiFilleter

Toggle Shavian

On 5/18/06, Star Raven <celestraof12worlds@...> wrote:
> I'm not sure what you mean by final /i/.

Since I brought that up -- this is what some call "happy tensing".
Basically, what's the last sound in words such as "really", "happy",
"city"? Is it the lax "if" sound or the tense "eat" sound?

To me, it sounds more like "eat" but it may be somewhere in between.
Historically in English RP, the sound was apparently "if".

Cheers,
--
Philip Newton <philip.newton@...>

From: "Lionel Ghoti" <Lionel.Ghoti@...>
Date: 2006-05-18 17:36:33 #
Subject: Re: Some ideas for modifications to phpGhotiFilleter

Toggle Shavian

Thanks, Philip. I was about to post a message asking if anyone knew
where to find just such a list.

It only covers vowels, though. The second example that I mentioned
before was "wh". RP realises this as /w/, but some accents realise it
as a sort of /hw/ sound. Do you know if/where there's a list of
lexical sets covering consonants, and/or can you think of any other
consonant-related examples?

LG

--- In shawalphabet@yahoogroups.com, "Philip Newton"
<philip.newton@...> wrote:
>
> You may want to have a look at the Wells's Standard Lexical Sets for
> English:
http://en.wikipedia.org/wiki/Lexical_set#Standard_Lexical_Sets_for_English
> for a list of differences that are meaningful in a wide range (but not
> all) of English dialects.
>
> Those sets concern themselves more with vowels, but they let you
> differentiate, for example, PALM (long "ah" for most people) - BATH
> (long "ah" for some, short "ash" for others) - TRAP (short "ash" for
> most people. Then the PALM sound could be represented with "ah" and
> the TRAP sound with "ash", while for BATH you'd need a third sound,
> which would then be mapped either onto "ah" or onto "ash", depending
> on the accent. Similarly, perhaps, for LOT ("on" for most?) - CLOTH
> ("on" for some, "awe" for others) - THOUGHT ("awe" for most?). And, if
> you wish, for NORTH - FORCE (probably "or" for the first set and a new
> symbol for the second, which gets mapped either to "or" or to "oak" +
> "array")
>
> Cheers,
> --
> Philip Newton <philip.newton@...>
>

From: "Lionel Ghoti" <Lionel.Ghoti@...>
Date: 2006-05-18 18:23:45 #
Subject: Re: Some ideas for modifications to phpGhotiFilleter

Toggle Shavian

My copy of the OED uses the following:

"peat" - /pi:t/
"pit" - /pIt/ (the smaller symbol, you know)
"pity" - /pIti/

I.e., the final sound in the last example, schwi, has the quality of
the sound in "peat", but the length of the sound in "pit". That seems
pretty accurate to me, in my accent at least. If I had to choose
between the /i:/ and /I/, I'd go for /i:/. Whenever I read Shavian
text using /I/ for schwi, I think of Ealing comedies.

LG

--- In shawalphabet@yahoogroups.com, "Philip Newton"
<philip.newton@...> wrote:
>
> On 5/18/06, Star Raven <celestraof12worlds@...> wrote:
> > I'm not sure what you mean by final /i/.
>
> Since I brought that up -- this is what some call "happy tensing".
> Basically, what's the last sound in words such as "really", "happy",
> "city"? Is it the lax "if" sound or the tense "eat" sound?
>
> To me, it sounds more like "eat" but it may be somewhere in between.
> Historically in English RP, the sound was apparently "if".
>
> Cheers,
> --
> Philip Newton <philip.newton@...>
>

From: "Philip Newton" <philip.newton@...>
Date: 2006-05-18 18:17:08 #
Subject: Re: [shawalphabet] Re: Some ideas for modifications to phpGhotiFilleter

Toggle Shavian

On 5/18/06, Lionel Ghoti <Lionel.Ghoti@...> wrote:
> It only covers vowels, though. The second example that I mentioned
> before was "wh". RP realises this as /w/, but some accents realise it
> as a sort of /hw/ sound. Do you know if/where there's a list of
> lexical sets covering consonants, and/or can you think of any other
> consonant-related examples?

I'm afraid I don't know of any suitable resource.

And the only two examples I can think of are yod-dropping ("noo" vs
"nyoo" for "new") and flapped-t ("better" sounding like "bedding" --
though I'm not sure whether that would be reflected in writing).

Cheers,
--
Philip Newton <philip.newton@...>

From: "Philip Newton" <philip.newton@...>
Date: 2006-05-18 18:32:43 #
Subject: Re: [shawalphabet] Re: Some ideas for modifications to phpGhotiFilleter

Toggle Shavian

On 5/18/06, Lionel Ghoti <Lionel.Ghoti@...> wrote:
> My copy of the OED uses the following:
>
> "peat" - /pi:t/
> "pit" - /pIt/ (the smaller symbol, you know)
> "pity" - /pIti/
>
> I.e., the final sound in the last example, schwi, has the quality of
> the sound in "peat", but the length of the sound in "pit". That seems
> pretty accurate to me, in my accent at least.

Sounds like it to me, too.

So the question arises, which phoneme does it represent -- "if" (which
it matches in length) or "eat" (which it matches in quality)?

I think this is not "shwi", though -- as I use it, "shwi" is a
nearly-neutral vowel but slightly "if"-coloured, as in the first
syllable of "defend". Paul V may be able to say more about this term.

Or perhaps a better example (which I got from
http://en.wikipedia.org/wiki/Schwi ) is "roses" vs "Rosa's" -- the
first word has a schwi in the second syllable, the second word has a
schwa there. And for some people, it's a schwa in both words.

"Roses" is not quite "roezizz" with a definite "if"-sound, but it's
not quite as neutral as in "Rosa's", either.

Cheers,
--
Philip Newton <philip.newton@...>

Shawalphabet YahooGroup Archive Browser