Shavian eGroup Archive Browser

From: Gary Shannon
Date: 2000-12-01 17:40:12 #
Subject: Re: [shavian] Re: Coming to a Window Near You

Toggle Shavian
----- Original Message -----
From: "Andrew Callaway" <acal@...>
To: <shavian@...>
Sent: Friday, December 01, 2000 6:15 AM
Subject: [shavian] Re: Coming to a Window Near You


> --- In shavian@..., "Gary Shannon" <reboot@r...> wrote:
> > I've just completed version 0.0 of my English to Shavian translator
> program
> > for Windows. If anybody is interested in playing with it can be
> downloaded
> Had a quick look, Gary. Seems OK.
>
> What language did you write it in?
>
> Andy

MS Visual C++ V6.0 using MFC.

--gary


-------------------------- eGroups Sponsor -------------------------~-~>
eGroups eLerts
It's Easy. It's Fun. Best of All, it's Free!
http://click.egroups.com/1/9698/1/_/54531/_/975692407/
---------------------------------------------------------------------_->

From: Hal Fulton
Date: 2000-12-05 19:47:53 #
Subject: [shavian] Caught up reading...

Toggle Shavian
Whew!

Apparently I had not been to the egroups website in perhaps
a month. I just read a ton of messages.

Here are my uselessly belated comments/questions. :)

1. A compromise between dialect and uniformity is a good idea. Let's
not forget that it's not a simple either-or proposition. Even with
the (relatively mild) spelling variations we see in this group,
Shavian has a far better correspondence of sound and symbol than
traditional orthography.

2. For people who create translators, but understandably don't want
to write semantic analyzers to distinguish between (e.g.) "read" and
"read" -- a question. Do you flag the problem words in some consistent
machine-readable way? If so, someone else could tackle the "second
stage" of processing. (I'd enjoy doing it myself, and have some ideas
for it -- but I have way too much on my plate already).

3. Edgar Allan Poe. I don't enjoy being nitpicky. Well, no, actually
maybe I do. :) But anyway there's no "e" in "Allan," not in normal
writing and presumably not in Shavian. (A common mistake -- I actually
have an old paperback with his name misspelled on the front cover
and the spine.)

4. The word "on." In my personal experience only whining Northerners
say "ahhn." Nothing personal against these people, but it hurts my
ears. Most people I know say (roughly) "awn" as in "awning." There
might be a subtle difference, probably not as much as a Brit would
perceive. In the Deep South, where I came from, it was pronounced
"own" to rhyme with "phone." No joke. Television cured me of that
one at an early age. (Yes, I realize it's all dialectal, and no one
dialect is really better than another.)

5. Re: "thither." Someone implied that the "th" sounds here were the
same. How many of you do/don't say it that way? I've always pronounced
it with an unvoiced th at the beginning and a voiced one near the end.
But it's not a word we use much on this side of the pond, not in my
area anyhow. It's used half-jokingly as a formalism or affectation or
in the phrase "hither and thither" (which itself is usually half-
joking).

6. Scott implies that Brits pronounce "controversy" differently from
Americans. How so??

7. Concerning words such as "president." Note that the second e is
in an unaccented syllable, causing it to be somewhat slurred or
hurried over. But I strive to "think" an e even when I can't hear it
in a recording of my own voice. I really believe I'm saying an e. Oh,
certainly, the sound is modified. But it is not justifiable, IMO, to
pronounce it as "uh" as some pseudo-spellings would do. (Some would
do the same with the i also.)

8. President continued. Do you know what I mean by a pseudo-spelling?
I refer to those awful attempts to render pronunciation in print
without reference to any standard notation. I have always hated these,
even as a child, because they were totally incapable of expressing
any nuances, even the most gross ones. (No! That's not REALLY how you
say it!)

9. President yet again. What about related forms, like "presidential"?
If I say/write "presidunt," should I say/write "presiduntial" or not?
If I don't, won't it look like an artifact of some kind? (Although the
embedded word "dunce" makes me smile.)

10. Concerning Quikscript. I don't think there's a real infringement
with the software thingy as long as you're not doing a related thing
and making money from it. The name itself: Well, "Read Alphabet" will
be hard to do a web search for, "read" being such a common word. The
newsgroup: I don't perceive a need to split the communities.

Just a few rambling comments.

Hal Fulton





-------------------------- eGroups Sponsor -------------------------~-~>
eLerts
It's Easy. It's Fun. Best of All, it's Free!
http://click.egroups.com/1/9699/1/_/54531/_/976045615/
---------------------------------------------------------------------_->

From: Gary Shannon
Date: 2000-12-06 00:14:28 #
Subject: Re: [shavian] Caught up reading...

Toggle Shavian
----- Original Message -----
From: "Hal Fulton" <hal9000@...>
To: <shavian@...>
Sent: Tuesday, December 05, 2000 11:46 AM
Subject: [shavian] Caught up reading...



<snip>

>
> 2. For people who create translators, but understandably don't want
> to write semantic analyzers to distinguish between (e.g.) "read" and
> "read" -- a question. Do you flag the problem words in some consistent
> machine-readable way? If so, someone else could tackle the "second
> stage" of processing. (I'd enjoy doing it myself, and have some ideas
> for it -- but I have way too much on my plate already).

My solution was to have the translator program stop each time it finds a
word for which there are more than one glossary entry and ask the user to
select one of the alternatives. It also stops on words not found in the
glossary so the user can provide the spelling (which is added to the
glossary).

In this way the whole thing is completed in one pass, but requires
interaction for the user during the translation process.

<snip>

> 4. The word "on." In my personal experience only whining Northerners
> say "ahhn." Nothing personal against these people, but it hurts my
> ears. Most people I know say (roughly) "awn" as in "awning."

Hmmm. interesting. Where I grew up in Michigan is was very distinctly
pronounced "ahhn". Then when I lived in California as a teen it was
pronounced the same way. Now, up here in Oregon, I still hear it pronounced
only "ahhn". I have heard it pronounced in a way that leans more toward
"awn", but I've always taken that as a clue that I'm listening to a
flat-lander from the mid-west.

--gary



-------------------------- eGroups Sponsor -------------------------~-~>
eGroups eLerts
It's Easy. It's Fun. Best of All, it's Free!
http://click.egroups.com/1/9698/1/_/54531/_/976061664/
---------------------------------------------------------------------_->

From: Scott Harrison
Date: 2000-12-06 11:49:49 #
Subject: Re: [shavian] Caught up reading...

Toggle Shavian
In a message from Hal Fulton <hal9000@...>
dated Tue, 05 Dec 2000 19:46:50 +0000, my mailer made me see:

->
-> 2. For people who create translators, but understandably don't want
-> to write semantic analyzers to distinguish between (e.g.) "read" and
-> "read" -- a question. Do you flag the problem words in some consistent
-> machine-readable way? If so, someone else could tackle the "second
-> stage" of processing. (I'd enjoy doing it myself, and have some ideas
-> for it -- but I have way too much on my plate already).
->

My translator program is split into two programs. One program takes the input text file, and translates it using a dictionary, and outputs the output file. This is a very simple program that really just does dictionary lookup. The second program is interactive. It reads the input file and the dictionary and presents a list of unique words found in the input file but not in the dictionary to the user. The user can then provide definitions for any subset of those words. These can then be saved to the dictionary.

I have developed a technique where I can have multiple definitions for a word. This technique allows me to have read in the dictionary with pronunciations like reed and red. Then when the translator program encounters these multiple defnitions it puts both into the output file marked in a definite manner.

I then need to go through the text manually and read the context of the words to determine which definition to use.

->
-> 6. Scott implies that Brits pronounce "controversy" differently from
-> Americans. How so??

CONtroversy vs. conTROversy

--
Scott Harrison

-------------------------- eGroups Sponsor -------------------------~-~>
eLerts
It's Easy. It's Fun. Best of All, it's Free!
http://click.egroups.com/1/9699/1/_/54531/_/976103384/
---------------------------------------------------------------------_->

From: Hal Fulton
Date: 2000-12-06 16:57:28 #
Subject: [shavian] Re: Caught up reading...

Toggle Shavian
--- In shavian@..., "Gary Shannon" <reboot@r...> wrote:
> My solution was to have the translator program stop each time it
finds a
> word for which there are more than one glossary entry and ask the
user to
> select one of the alternatives. It also stops on words not found in
the
> glossary so the user can provide the spelling (which is added to the
> glossary).
>
> In this way the whole thing is completed in one pass, but requires
> interaction for the user during the translation process.
>

That's a good way to do it, nice and simple. But supposing another
program were created to try to deal with the "hard" things -- do you
think an option could be added to just flag them and pass them on?

I'm really mostly dreaming, since I am *very* busy right now, but I
have always wanted to try out my little theories (which I'll mention
in the next reply in a minute...)

>
> > 4. The word "on." In my personal experience only whining
Northerners
> > say "ahhn." Nothing personal against these people, but it hurts my
> > ears. Most people I know say (roughly) "awn" as in "awning."
>
> Hmmm. interesting. Where I grew up in Michigan is was very
distinctly
> pronounced "ahhn". Then when I lived in California as a teen it was
> pronounced the same way. Now, up here in Oregon, I still hear it
pronounced
> only "ahhn". I have heard it pronounced in a way that leans more
toward
> "awn", but I've always taken that as a clue that I'm listening to a
> flat-lander from the mid-west.
>
> --gary

I'd love to know how the real frequencies go. I'll bet "ahhn" is the
most common in the cities (which account for most of the US
population). Probably I don't hear as much of it since I have never
lived in a city of more than 500,000 (and that only once).

Hal




-------------------------- eGroups Sponsor -------------------------~-~>
eGroups eLerts
It's Easy. It's Fun. Best of All, it's Free!
http://click.egroups.com/1/9698/1/_/54531/_/976121834/
---------------------------------------------------------------------_->

From: Gary Shannon
Date: 2000-12-06 17:17:47 #
Subject: Re: [shavian] Re: Caught up reading...

Toggle Shavian
----- Original Message -----
From: "Hal Fulton" <hal9000@...>
To: <shavian@...>
Sent: Wednesday, December 06, 2000 8:57 AM
Subject: [shavian] Re: Caught up reading...

<snip>

> > > 4. The word "on." In my personal experience only whining
> Northerners
> > > say "ahhn." Nothing personal against these people, but it hurts my
> > > ears. Most people I know say (roughly) "awn" as in "awning."
> >
> > Hmmm. interesting. Where I grew up in Michigan is was very
> distinctly
> > pronounced "ahhn". Then when I lived in California as a teen it was
> > pronounced the same way. Now, up here in Oregon, I still hear it
> pronounced
> > only "ahhn". I have heard it pronounced in a way that leans more
> toward
> > "awn", but I've always taken that as a clue that I'm listening to a
> > flat-lander from the mid-west.
> >
> > --gary
>
> I'd love to know how the real frequencies go. I'll bet "ahhn" is the
> most common in the cities (which account for most of the US
> population). Probably I don't hear as much of it since I have never
> lived in a city of more than 500,000 (and that only once).
>
> Hal

Interesting. You may have hit on the answer. I've always lived in the
city, and usually a very large city. I do have relatives who live "out in
the sticks" though, and whenever I visit with them I notice the "unusual"
way they pronounce a number of words. I'm sure the professional linguists
in the group could shed some light on this urban vs rural pronunciation
issue.

--gary




-------------------------- eGroups Sponsor -------------------------~-~>
eLerts
It's Easy. It's Fun. Best of All, it's Free!
http://click.egroups.com/1/9699/1/_/54531/_/976123065/
---------------------------------------------------------------------_->

From: Hal Fulton
Date: 2000-12-06 17:22:26 #
Subject: [shavian] Re: Caught up reading...

Toggle Shavian
--- In shavian@..., Scott Harrison <scott_harrison@a...>
wrote:
> In a message from Hal Fulton <hal9000@h...>
> dated Tue, 05 Dec 2000 19:46:50 +0000, my mailer made me see:
>
> ->
> -> 2. For people who create translators, but understandably don't
want
> -> to write semantic analyzers to distinguish between (e.g.) "read"
and
> -> "read" -- a question. Do you flag the problem words in some
consistent
> -> machine-readable way? If so, someone else could tackle the
"second
> -> stage" of processing. (I'd enjoy doing it myself, and have some
ideas
> -> for it -- but I have way too much on my plate already).
> ->
>
> My translator program is split into two programs. One program takes
>the input text file, and translates it using a dictionary, and
>outputs
>the output file. This is a very simple program that really just does
>dictionary lookup. The second program is interactive. It reads the
>input file and the dictionary and presents a list of unique words
>found in the input file but not in the dictionary to the user. The
>user can then provide definitions for any subset of those words.
>These can then be saved to the dictionary.
>
> I have developed a technique where I can have multiple definitions
>for a word. This technique allows me to have read in the dictionary
>with pronunciations like reed and red. Then when the translator
>program encounters these multiple defnitions it puts both into the
>output file marked in a definite manner.
>
> I then need to go through the text manually and read the context of
>the words to determine which definition to use.

I have an idea for a relatively simple rule-based system that I
believe could handle the majority of the common problem words.

It would *not* attempt to be a full-fledged parser, but would do a
crude context analysis in order to make a guess. There would be a set
of rules for each problem word.

Example: We find the word "read" and apply these rules:

Previous word is "to" --> "reed"
(as in: I want to read that.)
has/have pronoun/name [adverb] read --> "red"
(as in: Has she read it? Have they already read it?... recognize
a "name" as a sequence of one or more capitalized words inside a
sentence: Has Bob read it? Has John Smith read that?)
will/shall/does/do pronoun/name read --> "reed"
(Will you read it? Shall I read it? Does Bob read it?...)
having/being/been/etc. read --> "red"
(Having read it... it was being read... it had been read...)
Etc.

Other comments:

When finding end of sentence, disallow things like Dr., Mr. etc.
"Road" (Rd.) and such are problematic, since they may or may not be
at the end of a sentence.

We care about the end of a sentence because of capitalization... we
want to guess whether to use a namer dot. Many words capitalized in
the input will not use a namer dot in output. Could also potentially
use little word lists as needed... "Here are words that are NEVER
proper names even if the input text capitalizes them; here are words
that probably ARE proper names." That might be a good runtime option;
in Stephenson's _Snow Crash_, "Protagonist" is a proper name. If
you're processing _Pilgrim's Progress_, you'll have all sorts of
issues.

But if we made a tool that could handle even 80% of the questionable
cases correctly, that would be a step forward, I think... and it
could still flag the ones it wasn't certain about.

Just a few thoughts.

> ->
> -> 6. Scott implies that Brits pronounce "controversy" differently
from
> -> Americans. How so??
>
> CONtroversy vs. conTROversy
>

Now that is interesting. I never knew that.

Hal Fulton



-------------------------- eGroups Sponsor -------------------------~-~>
eLerts
It's Easy. It's Fun. Best of All, it's Free!
http://click.egroups.com/1/9699/1/_/54531/_/976123336/
---------------------------------------------------------------------_->

From: Gary Shannon
Date: 2000-12-06 19:35:18 #
Subject: Re: [shavian] Re: Caught up reading...

Toggle Shavian
----- Original Message -----
From: "Hal Fulton" <hal9000@...>
To: <shavian@...>
Sent: Wednesday, December 06, 2000 9:21 AM
Subject: [shavian] Re: Caught up reading...



<snip>

As a retired AI programmer I love looking for spoilers. <hehe>
Some exceptions to your rules:

>
> I have an idea for a relatively simple rule-based system that I
> believe could handle the majority of the common problem words.
>
> It would *not* attempt to be a full-fledged parser, but would do a
> crude context analysis in order to make a guess. There would be a set
> of rules for each problem word.
>
> Example: We find the word "read" and apply these rules:
>
> Previous word is "to" --> "reed"
> (as in: I want to read that.)

Spoiler: "The boy that I read to RED for himself later in the day."

> has/have pronoun/name [adverb] read --> "red"
> (as in: Has she read it? Have they already read it?... recognize
> a "name" as a sequence of one or more capitalized words inside a
> sentence: Has Bob read it? Has John Smith read that?)

Spoiler: "Why don't you have Bob REED it?"

> will/shall/does/do pronoun/name read --> "reed"
> (Will you read it? Shall I read it? Does Bob read it?...)

Spoiler: "Those who will not agree with me have never read the book, while
those that will RED it already."

> having/being/been/etc. read --> "red"
> (Having read it... it was being read... it had been read...)
> Etc.

Spoiler: "Can a human being REED this book?"

>
> Other comments:
>
> When finding end of sentence, disallow things like Dr., Mr. etc.
> "Road" (Rd.) and such are problematic, since they may or may not be
> at the end of a sentence.
>
> We care about the end of a sentence because of capitalization... we
> want to guess whether to use a namer dot. Many words capitalized in
> the input will not use a namer dot in output. Could also potentially
> use little word lists as needed... "Here are words that are NEVER
> proper names even if the input text capitalizes them; here are words
> that probably ARE proper names." That might be a good runtime option;
> in Stephenson's _Snow Crash_, "Protagonist" is a proper name. If
> you're processing _Pilgrim's Progress_, you'll have all sorts of
> issues.

Also from _Snow Crash_ "The Black Sun". Seems like just about any word can
be incorporated into a "proper noun" in the form of a business name or
title. ("National Parks Service", or the rock band "Kiss" just off the top
of my head.) And what about "Da5id", also from _Snow Crash_. Which brings
up the whole issue of made-up words like "Deliverator" and "Kourier" (also
from _Snow Crash_, since you brought it up :)

Also, what is the consensus on names like "McDonald". Does that take two
namer dots?

>
> But if we made a tool that could handle even 80% of the questionable
> cases correctly, that would be a step forward, I think... and it
> could still flag the ones it wasn't certain about.

I agree. If it could handle 80% of the cases that would be great. What I
worry about is the cases that it *thinks* it handled correctly, but did not.
In any case a complete proof-reading of the text will probably be required.
In any event the very existence of these problems shows what a sorry state
English spelling is in!

<snip>

--gary



-------------------------- eGroups Sponsor -------------------------~-~>
eGroups eLerts
It's Easy. It's Fun. Best of All, it's Free!
http://click.egroups.com/1/9698/1/_/54531/_/976131314/
---------------------------------------------------------------------_->

From: Phillip Driscoll
Date: 2000-12-07 01:19:14 #
Subject: Re: [shavian] Re: Caught up reading...

Toggle Shavian
-----Original Message-----
From: Gary Shannon <reboot@...>
To: shavian@... <shavian@...>
Date: Wednesday, December 06, 2000 12:17 PM
Subject: Re: [shavian] Re: Caught up reading...


>----- Original Message -----
>From: "Hal Fulton" <hal9000@...>
>To: <shavian@...>
>Sent: Wednesday, December 06, 2000 8:57 AM
>Subject: [shavian] Re: Caught up reading...
>
><snip>
>
>> > > 4. The word "on."
>>
>> I'd love to know how the real frequencies go. I'll bet "ahhn" is the
>> most common in the cities (which account for most of the US
>> population). Probably I don't hear as much of it since I have never
>> lived in a city of more than 500,000 (and that only once).
>>
>> Hal
>
>Interesting. You may have hit on the answer. I've always lived in the
>city, and usually a very large city. I do have relatives who live "out in
>the sticks" though, and whenever I visit with them I notice the "unusual"
>way they pronounce a number of words. I'm sure the professional linguists
>in the group could shed some light on this urban vs rural pronunciation
>issue.
>
>--gary


I've lived my entire life out in the rural areas of Michigan about fifty
miles
west of Detroit. I've never heard "on" pronounced differently than "ahhn."

--Phil


-------------------------- eGroups Sponsor -------------------------~-~>
eLerts
It's Easy. It's Fun. Best of All, it's Free!
http://click.egroups.com/1/9699/1/_/54531/_/976150603/
---------------------------------------------------------------------_->

From: Hal Fulton
Date: 2000-12-08 17:18:07 #
Subject: [shavian] Re: Caught up reading...

Toggle Shavian
--- In shavian@..., "Phillip Driscoll" <phild@c...> wrote:
> -----Original Message-----
> From: Gary Shannon <reboot@r...>
> To: shavian@... <shavian@...>
> Date: Wednesday, December 06, 2000 12:17 PM
> Subject: Re: [shavian] Re: Caught up reading...
>
>
> >----- Original Message -----
> >From: "Hal Fulton" <hal9000@h...>
> >To: <shavian@...>
> >Sent: Wednesday, December 06, 2000 8:57 AM
> >Subject: [shavian] Re: Caught up reading...
> >
> ><snip>
> >
> >> > > 4. The word "on."
> >>
> >> I'd love to know how the real frequencies go. I'll bet "ahhn" is
the
> >> most common in the cities (which account for most of the US
> >> population). Probably I don't hear as much of it since I have
never
> >> lived in a city of more than 500,000 (and that only once).
> >>
> >> Hal
> >
> >Interesting. You may have hit on the answer. I've always lived in
the
> >city, and usually a very large city. I do have relatives who live
"out in
> >the sticks" though, and whenever I visit with them I notice the
"unusual"
> >way they pronounce a number of words. I'm sure the professional
linguists
> >in the group could shed some light on this urban vs rural
pronunciation
> >issue.
> >
> >--gary
>
>
> I've lived my entire life out in the rural areas of Michigan about
fifty
> miles
> west of Detroit. I've never heard "on" pronounced differently than
"ahhn."
>
> --Phil

OK, a refinement of my (likely incorrect) theory.

To be honest, as a lifelong resident of the deep South, I grew up
hearing/using the word "Yankee" (though mostly in a joking way) to
refer to Northerners. (Yes, I know the word has different usages
and different contexts.) But of course, to be a Yankee is not just a
geographic matter. It is a way of speaking, acting, and thinking.
For example, visiting friends in Connecticut at the age of 15 -- this
was the first time that I ever heard the word "tea" defaulting to
hot tea. (In the South, it defaults to iced tea. A climate thing, I
expect; and fewer leftover British influences.)

Now, in the broader sense of the word "Yankee" -- an ill-defined
slang term, I admit once again -- Yankees are not just found in the
north. They tend to inhabit cities as well as certain specific
regions. For whatever reason, it seems that one of the very
southernmost states, Florida, has a high percentage of Yankees.

At any rate, I perceive "ahhn" as a Yankeeism. And a resident of
Michigan, though it were rural Michigan :), might just qualify as
a Yankee. But never having been with 600 miles of Michigan, and
having known very few people originating there, I couldn't say.

And as for tea (far off-topic from Shavian!): If I were offered tea
in Michigan, I wouldn't know whether they meant cold or hot. I do
know that the Southern tradition wins in the long run, because
Jean-Luc Picard in speaking to his replicator has to specifiy "Tea,
Earl Grey, hot." :)

Cheers,
Hal






-------------------------- eGroups Sponsor -------------------------~-~>
eLerts
It's Easy. It's Fun. Best of All, it's Free!
http://click.egroups.com/1/9699/1/_/54531/_/976295875/
---------------------------------------------------------------------_->