YahooGroup Archive

From: "Lionel Ghoti" <Lionel.Ghoti@...>
Date: 2006-05-15 17:11:43 #
Subject: Re: phpGhotiFilleter, anyone?

Hi all,

Thanks for the appreciative comments!

In response to Paul's question about the limit on the number of words
the database can hold, I've just done a few quick calculations.

The database table used by phpGhotiFilleter is currently 11 K in size.
Let's say 10 K for ease of reckoning. It has 136 rows, i.e. 136
word-pairs. Round that down to 120 rows because we rounded the file size
down.

My web hosting account allows me two MySQL databases, each of 100 Mb.
One is completely empty and the other (which contains the Shavian table
amongst other things) is 0.03% full. Let's suppose the Shavian table
will have exclusive use of just one of the databases.

So that's

10 K (used)
---------------------------
100,000 K (available)

I.e., the database can grow to 10,000 times its present size.

120 x 10,000 = 1,200,000 word-pairs

So we have some breathing room before we have to worry about running out
of space!

LG

--- In shawalphabet@yahoogroups.com, "paul vandenbrink"
<pvandenbrink11@...> wrote:
>
> Welcome back, Lionel
> Let me second Philip's assessment of your new filliter.
> Good Show.
> The Database seems a bit dry, but I am certain we will fill it
> up shortly. Are there any limits on how many English words,
> it can hold?
> Regards, Paul V.
> P.S. I like the prompting when there is more than one valid choice.

From: "paul vandenbrink" <pvandenbrink11@...>
Date: 2006-05-16 14:33:09 #
Subject: Re: phpGhotiFilleter, anyone?

Toggle Shavian

Hi Algy
I typed in 2 typos, incorrect Shavian Spelling's for the words,
time and June. Is there any way for me to correct my mistakes?
Regards, Paul V.
_________________attached________________________________
--- In shawalphabet@yahoogroups.com, "Lionel Ghoti"
<Lionel.Ghoti@...> wrote:
> In response to Paul's question about the limit on the number of
words
> the database can hold, I've just done a few quick calculations.
>
> The database table used by phpGhotiFilleter is currently 11 K in
size.
> Let's say 10 K for ease of reckoning. It has 136 rows, i.e. 136
> word-pairs. Round that down to 120 rows because we rounded the file
size
> down.
> I.e., the database can grow to 10,000 times its present size.
>
> 120 x 10,000 = 1,200,000 word-pairs
>
> So we have some breathing room before we have to worry about
running out
> of space!
>
> LG

From: "Lionel Ghoti" <Lionel.Ghoti@...>
Date: 2006-05-16 16:54:38 #
Subject: Re: phpGhotiFilleter, anyone?

Toggle Shavian

Algy?

At the moment you can't modify or delete existing records. I didn't
want to open that up to unknown users, to guard against sabotage (e.g,
your forum page at http://www.shawalphabet.com/nuke/index.php :
"h4ck3d by dR4GGy" -- why is the Shavian community being targeted by
hackers?).

On my Change Log (see the link on the Filleter's index page) I've
listed some ideas about how users might be allowed to edit and delete
records. For the time being I'll delete the records myself.

LG

--- In shawalphabet@yahoogroups.com, "paul vandenbrink"
<pvandenbrink11@...> wrote:
>
> Hi Algy
> I typed in 2 typos, incorrect Shavian Spelling's for the words,
> time and June. Is there any way for me to correct my mistakes?
> Regards, Paul V.

From: "paul vandenbrink" <pvandenbrink11@...>
Date: 2006-05-16 20:40:17 #
Subject: Viruses?

Toggle Shavian

Hi Lionel
Lionel Ghoti => L.G. => Algy => Algernon => Flowers for ?
Sorry for the my Mispelling of your abbreviation, L.G.
Thanks for fixing up my typos?
My site is still under construction, and alas under attack.
Still working on some corrections to th Abjad. See Omniglot.com.
In the meantime, I am willing to go along with whatever correction
process that you come up with for the Filleter database.
I put about 200 of the more common English words into the database,
so Filleter, should definately come back with some results.
Regards, Paul V.
________________attached_________________________________
--- In shawalphabet@yahoogroups.com, "Lionel Ghoti" <Lionel.Ghoti@...>
wrote:
> At the moment you can't modify or delete existing records. I didn't
> want to open that up to unknown users, to guard against sabotage (e.g,
> your forum page at http://www.shawalphabet.com/nuke/index.php :
> "h4ck3d by dR4GGy" -- why is the Shavian community being targeted by
> hackers?).
>
> On my Change Log (see the link on the Filleter's index page) I've
> listed some ideas about how users might be allowed to edit and delete
> records. For the time being I'll delete the records myself.

From: Joseph Spicer <wurdbendur@...>
Date: 2006-05-17 05:00:21 #
Subject: Re: [shawalphabet] Re: phpGhotiFilleter, anyone?

Toggle Shavian

I've just added a long list of words, probably doubling the size of the
database. In all that typing , I made a few mistakes: surely, wants
(forgot the s), and I think there are
some others that I've lost track of. Hopefully they'll be found through
use.

I was just thinking that it would be helpful if there were a way to add
homographs for extant words from the processing page. I found I can
just click the link again to add one, but that doesn't work if the word
has already been found in the database. Maybe the words could be linked
or something?

Anyway, it's time to give my hands a rest from typing. I think I'll
make another list to add tomorrow.

Regards,

Joseph Spicer

JOsaf spFsD

Lionel Ghoti wrote:

Algy?

At the moment you can't modify or delete existing records. I didn't
want to open that up to unknown users, to guard against sabotage (e.g,
your forum page at http://www.shawalphabet.com/nuke/index.php :
"h4ck3d by dR4GGy" -- why is the Shavian community being targeted by
hackers?).

On my Change Log (see the link on the Filleter's index page) I've
listed some ideas about how users might be allowed to edit and delete
records. For the time being I'll delete the records myself.

LG

--- In shawalphabet@yahoogroups.com, "paul vandenbrink"
wrote:

Hi Algy
I typed in 2 typos, incorrect Shavian Spelling's for the words,
time and June. Is there any way for me to correct my mistakes?
Regards, Paul V.

From: Ethan <ethanl@...>
Date: 2006-05-17 06:23:02 #
Subject: Re: [shawalphabet] phpGhotiFilleter, anyone?

Toggle Shavian

Lionel Ghoti wrote:

>phpGhotiFilleter:
>
>http://www.saytheword.org.uk/shavian/phpghotifilleter/index.php
>
>- A PHP application which allows you to transliterate texts from the
>Roman alphabet to the Shavian alphabet, through your web browser.
>
>It stores a table of Roman-Shavian word pairs in a MySQL database. If
>you ask it to transliterate a Roman word which it does not know, it
>asks you how to write it in Shavian, and then that word pair is stored
>in the database. It takes into account homographs, i.e. identical
>Roman spellings which have different meanings and sounds (e.g. desert
>(verb) and desert (noun)). If it comes across a Roman word for which
>it has more than one Shavian spelling, it presents you with a
>drop-down box containing all of the available options so that you can
>select the correct one.
>
>I'm out of touch with the Shavian community, so there might already be
>something like this out there. I made phpGhotiFilleter as a way of
>learning how to use PHP with MySQL. I think it works quite well, but
>I've only just got it working, so there are likely to be some
>unrevealed bugs. I'd be very grateful if you could play around with it
>and let me know if it works for you, what's wrong with it, what could
>be improved, etc.
>
>There are very few word pairs in the database at the moment, so
>transliteration is a fairly laborious process because it requires lots
>of user input. That will obviously get better as entries are added to
>the database.
>
>Looking forward to reading your comments,
>
>Lionel Ghoti
>
>
I like it! Very good idea. I've been toying with the idea of making
such a thing, but never got started. Now I have one question - can you
release this as open source, say GPL or whatever your preference?
Because I can see where this would be useful to have for different
servers, different databases, etc. Basically, I'd love to have
something like this on my server, where I could create my own database
of unique entries. If something like this were available, I would
probably add improvements to it myself, and would then make those
improvements available to you if you could use them. Is this a possibility?

--
Ethan

Have you ever wondered what it'd be like to fly like a bird? Wonder no longer! www.maximumride.com

From: "Lionel Ghoti" <Lionel.Ghoti@...>
Date: 2006-05-17 17:07:41 #
Subject: Re: phpGhotiFilleter, anyone?

Toggle Shavian

I might consider open-sourcing it in the future, but not now. At the
moment my plan for phpGhotiFilleter is a bit of an amorphous, mutating
blob, because I keep having ideas about how to change it. The code isn't
fit for public consumption yet, and it is going to change a lot. (And
there isn't really a whole lot of code there anyway: it just does a few
db lookups, with a bit of processing before and after.)

More on the proposed changes later, because I will want the opinions of
the people who will be using it, particularly with regard to their
different accents...

LG

--- In shawalphabet@yahoogroups.com, Ethan <ethanl@...> wrote:

> >
> >
> I like it! Very good idea. I've been toying with the idea of making
> such a thing, but never got started. Now I have one question - can
you
> release this as open source, say GPL or whatever your preference?
> Because I can see where this would be useful to have for different
> servers, different databases, etc. Basically, I'd love to have
> something like this on my server, where I could create my own database
> of unique entries. If something like this were available, I would
> probably add improvements to it myself, and would then make those
> improvements available to you if you could use them. Is this a
possibility?
>
> --
> Ethan
>
> Have you ever wondered what it'd be like to fly like a bird? Wonder
no longer! www.maximumride.com
>

From: "Lionel Ghoti" <Lionel.Ghoti@...>
Date: 2006-05-17 17:59:01 #
Subject: Some ideas for modifications to phpGhotiFilleter

Toggle Shavian

I have been looking at the additions made to the database by different
people and I'm beginning to think that the Filleter in its present
state is going to become difficult to use if it contains many
pronunciations of each word in different accents. The homograph-option
functionality was intended to allow a user to select the appropriate
pronunciation for a Roman spelling _within his own accent_, but
already users are being presented with pronunciation options which are
not options within their own accent. When an Australian user does a
transliteration, for example, he wants to be prompted for input as
little as possible: if he has to be asked to select, every other word,
between a British RP variant and an East Coast US variant and a
Canadian variant and a South African variant, and so on, then he will
very quickly become fed up.

I think there are two possible solutions to this:

1) phpGhotiFilleter's database should contain Shavian spellings for
one accent only (perhaps the accent used by our de facto standard,
Androcles and the Lion); or

2) It should store pronunciations in different accents, and would
allow the user to select their target accent. If no pronunciations
were available for the target accent, it would use the default accent.

I'm leaning towards the second option. An extra field "accentflag"
could be added to the database. The first record for any Roman word
would be the default pronunciation (probably in Androcles-ese). Any
subsequent records for that Roman word could optionally be marked with
an accent flag (e.g., AM for American, CA for Canadian, SA for South
African, etc.). On the page where the user enters Roman text for
transliteration, there would be an option to select the target accent.

Example:

Case 1: An Australian user selects the Australian option on the
text-entry page and attempts to transliterate the word "sample". There
is no Australian-flagged word-pair for "sample", but there is a
non-flagged (default Androcles-ese) word-pair, so that is used without
requiring any further input from the user. However, on the processing
page the word is coloured red, say, to let the user know that the
transliteration has not been taken from their native wordset. This
would also be an HTML link to the Add a Word page, so that an
Australian variant could be added if required.

Case 2: The same Australian user again selects the Australian option
on the text-entry page and attempts to transliterate the word
"sample". This time there is an Australian-flagged word-pair for the
word. Only the Australian-flagged Shavian word is offered as a
transliteration, and it is coloured black to show that it is from the
user's native wordset. But the word is also, again, an HTML link to
the Add a Word page, just in case an Australian homograph needs to be
added. (ALL words in the future version of pGF would be HTML links to
the Add a Word page.)

Questions:

1) What do you all think to that?

2) If we were to go down the many-accents path, what are the different
accents that you think should be covered? I would want there to be as
few types as possible, but enough to enable any user to produce text
which they would find easy to read.

One further point: If a new many-accents version is implemented there
will be no need to clear down the existing database, but the existing
word-pairs will be unflagged, and so will need flagging. I think it
would be a good idea if people didn't throw themselves too heavily
into adding words to the database until it has been decided whether to
make this change (and the changes have been made), so that we don't
have to flag lots of existing words manually.

LG

From: "Philip Newton" <philip.newton@...>
Date: 2006-05-17 19:22:29 #
Subject: Re: [shawalphabet] Some ideas for modifications to phpGhotiFilleter

Toggle Shavian

On 5/17/06, Lionel Ghoti <Lionel.Ghoti@...> wrote:
> 2) It should store pronunciations in different accents, and would
> allow the user to select their target accent. If no pronunciations
> were available for the target accent, it would use the default accent.
>
> I'm leaning towards the second option.

Sounds like a good idea.

> 2) If we were to go down the many-accents path, what are the different
> accents that you think should be covered?

I think it's better to approach this in terms of mergers and splits,
rather than in terms of countries, since Shavian is phonemic, rather
than phonetic. So if Australian "age" sounds like British "ice", and
*always* does so, then I think the "age" letter is appropriate, so
Australian wouldn't need a separate accent just because of that.

Some mergers and splits I can think of:
- broad-A vs non-broad-A (e.g. Sam-psalm: some have 'ash' for both,
some have 'ash' vs 'ah', aka the trap-bath split)
- short-o variants vs those that don't have this (for some, short-o as
in 'on' merges into 'awe' or 'ah' or both -- the father/bother and/or
cot/caught mergers)
- distinct vowels before 'r' in addition to rhotics (do "mirror" and
"nearer" / "furry" and "hurry" / "merry" and "Mary" rhyme?)
- yod-dropping ("new" = "nM" or "nV"?)

and possibly

- shwi vs shwa ("dasFd" or "disFd" for "decide"?)
- representation of final -i/y ("siti" or "sitI" for "city"?)
- NORTH/FORCE (are "or" and "oar" separate?)
- variants with a "wh" vs those that lack that as a separate phoneme
(are "which" and "witch" pronounced the same?)
- flapped intervocalic "t" (do "bitter" and "bidder" / "rated" and
"raided" sound the same?)

If this kind of approach is taken, though, it might be better to see
it as a set of features that your accent has or hasn't, rather than a
set of accents to choose from (since lacking broad A is independent of
whether short-o exists or not, for example).

Cheers,
Philip
--
Philip Newton <philip.newton@...>

From: "Lionel Ghoti" <Lionel.Ghoti@...>
Date: 2006-05-17 22:24:28 #
Subject: Re: Some ideas for modifications to phpGhotiFilleter

Toggle Shavian

Hi Philip,

Yes, Shavian is phonemic, but the Traditional Orthography isn't, and
that's what we're starting with. If the program is to work on a
phonemic level rather than a word level, then before it even gets
around to transliterating anything it has to determine what the
phonemes in each TO word are, and that would require a LOT more
sophistication than the current program has -- and even then there
would be a lot of room for error due to orthographic idiosyncrasies.
It would be great to take that approach, but I don't think it's viable
given the complexity of the English language, my programming ability
and the amount of time I'm willing to expend on the project.

If you're suggesting that the program should contain a table of
underlying phonemic values for each TO word, and then that it should
contain a table of how those underlying phonemic values map onto
phonemes for each target accent, I think that would be very hard to
do. Given the different phonemic patterns that different accents have,
we'd be dealing with meta-phonemics rather than pure, simple
phonemics. I could see a lot going wrong (and who would provide the
data?). (But I haven't thought about that for long enough yet. Please
let me know if you think I've got the wrong end of the stick.)

If someone doing a transliteration were asked to define the
characteristics of their accent (Do "put" and "putt" rhyme? Do "crass"
and "grass" rhyme? and so on...) I think that would be asking too
much. Easier to say, "Where, broadly, do you come from?"

If we work at the level of words rather than phonemes, then we will
have a larger database, but, if we have a well-populated database, we
will be able to produce a perfect transliteration every time (assuming
that all homographs have been added and that the user selects each one
accurately), because the really intelligent work will have been done
by the people who created the database records.

I have to think more about this, but the word-by-word approach feels
right to me, because it is simple and simple is usually good. This is
the approach that was taken by the earlier, PC-based version of
Ghoti-Fingers. I could always rely on it to produce an accurate piece
of text as long as all homographs were accounted for, but I don't
think I could expect the same of a more sophisticated (and
error-prone) method.

LG

--- In shawalphabet@yahoogroups.com, "Philip Newton"
<philip.newton@...> wrote:
>
> On 5/17/06, Lionel Ghoti <Lionel.Ghoti@...> wrote:
> > 2) It should store pronunciations in different accents, and would
> > allow the user to select their target accent. If no pronunciations
> > were available for the target accent, it would use the default accent.
> >
> > I'm leaning towards the second option.
>
> Sounds like a good idea.
>
> > 2) If we were to go down the many-accents path, what are the different
> > accents that you think should be covered?
>
> I think it's better to approach this in terms of mergers and splits,
> rather than in terms of countries, since Shavian is phonemic, rather
> than phonetic. So if Australian "age" sounds like British "ice", and
> *always* does so, then I think the "age" letter is appropriate, so
> Australian wouldn't need a separate accent just because of that.
>
> Some mergers and splits I can think of:
> - broad-A vs non-broad-A (e.g. Sam-psalm: some have 'ash' for both,
> some have 'ash' vs 'ah', aka the trap-bath split)
> - short-o variants vs those that don't have this (for some, short-o as
> in 'on' merges into 'awe' or 'ah' or both -- the father/bother and/or
> cot/caught mergers)
> - distinct vowels before 'r' in addition to rhotics (do "mirror" and
> "nearer" / "furry" and "hurry" / "merry" and "Mary" rhyme?)
> - yod-dropping ("new" = "nM" or "nV"?)
>
> and possibly
>
> - shwi vs shwa ("dasFd" or "disFd" for "decide"?)
> - representation of final -i/y ("siti" or "sitI" for "city"?)
> - NORTH/FORCE (are "or" and "oar" separate?)
> - variants with a "wh" vs those that lack that as a separate phoneme
> (are "which" and "witch" pronounced the same?)
> - flapped intervocalic "t" (do "bitter" and "bidder" / "rated" and
> "raided" sound the same?)
>
> If this kind of approach is taken, though, it might be better to see
> it as a set of features that your accent has or hasn't, rather than a
> set of accents to choose from (since lacking broad A is independent of
> whether short-o exists or not, for example).
>
> Cheers,
> Philip
> --
> Philip Newton <philip.newton@...>
>

Shawalphabet YahooGroup Archive Browser