Shavian eGroup Archive Browser
From: A.M.Callaway
Date: 1999-12-19 13:37:01 #
Subject: [shavian] Re: Homographs
Toggle Shavian
At 11:57 AM 12/17/99 -0500, you wrote:
>In a message from Lionel Ghoti <lionel_ghoti@...>
>dated Fri, 17 Dec 1999 02:18:19 +0000, my mailer made me see:
>
>->
>-> How does your automatic translator work? Is it based on the CMU
pronouncing dictionary?
>->
>-> BY THE WAY, DOES ANYONE KNOW IF A BRITISH RP PRONUNCIATION DICTIONARY
EXISTS IN COMPUTER-FILE FORM???
>->
>
> I wish I could download the CMU pronouncing dictionary, but the page will
not let me at it because of permission problems. Therefore, if someone has
it can you point me to it (or mail it to me in a compressed form if it is
small enough).
I have it as part of The Phonetic Translator, which you can find on my web
page at www.ozemail.com.au/~acal and follow the links to downloads, or look
in the egroups vault. It is about 800k in compressed form (zip).
- .+'^'+. A.M.Callaway ----------------- acal@...
- A N D Y Melbourne, Australia --- a.callaway@...
- `+.,.+' www.ozemail.com.au/~acal -------------------------
------------------------------------------------------------------------
Want to send money instantly to anyone, anywhere, anytime?
You can today at X.com - and we'll give you $20 to try it! Sign
up today at X.com. It's quick, free, & there's no obligation!
http://click.egroups.com/1/332/4/_/54531/_/945610619
-- Easily schedule meetings and events using the group calendar!
-- http://www.egroups.com/cal?listname=shavian&m=1
From: Daniel G. Szczurek
Date: 1999-12-19 21:12:24 #
Subject: [shavian] Re: Homographs
Toggle Shavian
Dear fellow Shavians,
Is the "Curse of the Short O" coming in here. The "o" vowel in "pot." It
was the focus of a paper I wrote. I don't even pronounce such words
consistently. The more slowly and carefully I pronounce such words, the more
"aw" quality there is in them. But in normal speech, I have 2 categories of
words, the majority are clearly "ah" pronunciations, some clearly "aw." I
have yet to be able to find a phonetic rule which determines which will
occur. Dan
----------
>From: Scott Harrison <Scott_Harrison@...>
>To: shavian@...
>Subject: [shavian] Re: Homographs
>Date: Sat, Dec 18, 1999, 8:39 AM
>
>In a message from Lionel Ghoti <lionel_ghoti@...>
>dated Sat, 18 Dec 1999 01:34:40 +0000, my mailer made me see:
>
>->
>-> > I wish I could download the CMU pronouncing dictionary, but the page
>-> > will not let me at it because of permission problems. Therefore, if
>-> > someone has it can you point me to it (or mail it to me in a compressed
>-> > form if it is small enough).
>->
>-> My web browser crashes every time I try to download it from the CMU web
>site. So I got it with my FTP program instead. Now I've put it in the Vault.
>->
>-> Go to www.egroups.com/group/shavian
>-> ...then select the "Vault" link, then "datafiles". There you should find
>the file cmudict--c06d.zip. It's 852k.
>->
>
>Thanks. I have successfully downloaded it from the Vault. I now have to
>evaluate its contents to see how applicable it is to what I am doing. At
>first blush it seems that the "a" in "ado" is not handled the way I prefer.
> This may cause a problem. (Of course I already have a problem with "ah"
>as it is.)
>
>--Scott
>
>------------------------------------------------------------------------
>Want to send money instantly to anyone, anywhere, anytime?
>You can today at X.com - and we'll give you $20 to try it! Sign
>up today at X.com. It's quick, free, & there's no obligation!
>http://click.egroups.com/1/332/4/_/54531/_/945535202
>
>-- Create a poll/survey for your group!
>-- http://www.egroups.com/vote?listname=shavian&m=1
>
>
>
------------------------------------------------------------------------
Want to send money instantly to anyone, anywhere, anytime?
You can today at X.com - and we'll give you $20 to try it! Sign
up today at X.com. It's quick, free, & there's no obligation!
http://click.egroups.com/1/332/4/_/54531/_/945637941
-- 20 megs of disk space in your group's Document Vault
-- http://www.egroups.com/docvault/shavian/?m=1
From: Scott Harrison
Date: 1999-12-20 15:51:05 #
Subject: [shavian] Re: Data formats for Shavian
Toggle Shavian
In a message from Lionel Ghoti <lionel_ghoti@...>
dated Sun, 19 Dec 1999 00:26:59 +0000, my mailer made me see:
Lionel,
My comments are inline...
-> I don't really understand Unicode. I think I must have been off sick the
day they taught it at school. I've looked at www.unicode.org, but it seems
to be aimed at professional programmers, of which I am not one. As far as I
understand it, Unicode is a standard by which every alphabetical character
commonly used in the world is assigned its own unique code. I hope that this
would mean that it would be possible to write a document in, say, the
Shavian and Roman alphabets without switching fonts. I'd guess that a font
could theoretically include all of the many thousands of characters in the
Unicode set, but that this would be impractical, resulting in a huge file
size for such a font... So I suppose that for our purposes, a font called,
say ShavianRomanIPA might be created, which would allow us to converse about
our corner of spelling reform using three alphabets, but without continually
switching font.
->
You are correct in what you are saying here.
-> Practically everything I've said above is mere assumption, so I expect to
be at least half wrong about a lot of it. I can't find any web site that
explains the basics clearly.
->
Not wrong. I'll send another note explaining Unicode.
-> I've looked at the Shavian section of your web site, Scott. Of the
Declaration of Independence documents, the one listed as "Shavian" comes out
in my web browser as lots of empty squares separated by commas, full stops
and spaces; the one listed as "UTF-8 Web Page" looks just the same. If I
view the HTML source of the latter in a text-editor, it looks like lots of
"ioe'ioe-ioe", with plenty of diacritics and ligatures thrown in. The web
browser is displaying it in Times New Roman, I think; the text-editor in
Courier New.
->
-> I'm using Windows 95 and Internet Explorer 5. What do I have to do to
view your pages as they should be seen? And how do I write pages using
Unicode? If I'm writing a web page and I want to write the sound at the
beginning of "tot" in both Shavian and Roman alphabets, then I first specify
a Roman font, type "t" on my keyboard, then specify a Shavian font, then
type "t" again. How would I enter the two different symbols in Unicode? I
think I might have got the wrong end of the stick about this Unicode thing,
and I suppose I'm asking the wrong questions; but I'm asking them all the
same, just to let you see how much explaining needs to be done before we
baffled Unicode ignoramuses can jump on the bandwagon.
->
To view the UTF-8 pages on my web site you will have to download a font
provided by Phillip Driscoll on his website http://www2.c4systm.com/~phild
which says "Download Unicode with Arial version". After downloading, add it
to your system fonts. Then go into Internet Explorer and make the font
called "Shaw Sans No. 2 (with Arial)" be the default font for "Universal
Alphabet" or "UTF-8" pages. After you do that reload my UTF-8 pages and
everything should come out in both Latin and Shavian letters at the same
time.
I'll explain more Unicode stuff in the next post.
--Scott
------------------------------------------------------------------------
Want to send money instantly to anyone, anywhere, anytime?
You can today at X.com - and we'll give you $20 to try it! Sign
up today at X.com. It's quick, free, & there's no obligation!
http://click.egroups.com/1/332/4/_/54531/_/945705058
-- Talk to your group with your own voice!
-- http://www.egroups.com/VoiceChatPage?listName=shavian&m=1
From: Scott Harrison
Date: 1999-12-20 16:26:07 #
Subject: [shavian] Unicode
Toggle Shavian
Hello,
If you do not care about Unicode please delete the post and forgive me for
sending it to such a wide distribution.
Unicode is an attempt to come up with a way where we can provide textual
information in all the world's writing systems without having to switch
fonts and do a whole bunch of other tricks people curently do when storing
text information.
Most characters are stored in a computer currently as 8-bit chunks of data.
This allows 256 unique character values for each character set. Therefore,
if one wants to store more that 256 unique values one needs to use more than
one character set. This is a pain because there are many different ways to
specify how to know what character set one switches to, etc. Unicode solves
this problem by storing each character as a 16-bit chunk of data.
Therefore, Unicode can provide a bit over 65,000 characters in its character
set. For most purposes, this allows us to provide a unique character for
every single character used in the world. This even allows us to write
Klingon, Shavian and Tengwar for those that are interested.
Note that Unicode does not specify anything about the language that is
used, only what characters are stored. Therefore, when writing in Latin
characters the language can be English, French, etc. And when encountering
Cyrillic characters one does not know whether the language is Russian,
Bulgarian, Ukranian, etc. The problem of language identification is not
really a problem that needs to be addressed at the character level. This is
typically handled by tagging the characters with language information, and
is used for things like what spell checker to use, what gramatical analysis
to use, etc.
Note that since Unicode stores each character as 16 bits instead of 8 bits
that each text file is twice as big in Unicode. There are ways this can be
compressed but that is really unimportant for the amount of data we are
talking about.
UTF-8 is a way to store Unicode information so it looks like ASCII, i.e.,
in 8-bit form. This is especially important because the data is actually
readable by most programs since the lower Unicode characters (which match
the ASCII values perfectly) are stored in ASCII, and only the upper
characters are mapped in a funky way. Therefore, if one makes a web page in
UTF-8, all the tags are perfectly readable. Only the "Unicode" values are
encoded. UTF-8 web pages allow people to create web pages that can show all
the worlds characters without requiring people to have font specific
information in the web page. Having font specific information in a web page
is a bad thing because it causes the person reading the page to need to have
a specific font loaded.
The world of fonts is a tricky one. Let me preface the with a little
background. Personally, I use MacOS X as my operating system of choice.
Actually, at the moment it is MacOS X Server. MacOS X grew from OPENSTEP
(my second favorite operating system). I use Windows NT every day, but am
not familiar with all the intricacies of it and its font systems. The
OPENSTEP system (and therefore MacOS X) uses an interesting way to handle
fonts and Unicode characters. When it sees a set of Unicode characters it
knows which fonts can render them and then automatically uses those fonts to
render the characters. This is done automatically across all applications
without the need for user intervention. Therefore, I can read Chinese and
then Russian and then Shavian all without having the data specify a font,
and without having me need to switch fonts. It is extraordinarily useful
for linguists. Now, in my brief experimentation it seems that Windows
operates totally differently. It seems that each application is in control
of what fonts are used for what textual data. And it seems that one needs
to have a font that contains ALL the characters one wants to display.
Therefore, for us to be able to show Shavian and Latin at the same time, we
need a font that contains both of these sets of characters at their proper
Unicode points. This is unlike OPENSTEP where one can have two fonts doing
this duty - one to show the Latin characters and one to show Shavian
characters.
Therefore, in the Windows world we need combined fonts. This means the
fonts get larger when one wants to be able to read multiple languages.
Since I like to read Russian, Urdu, Shavian and various languages
represented with Latin script, I need a pretty large font. However, in
reality I have found no one that really writes Urdu data - the information
is ususally an image. But still, the font needs to be large. Luckily for
me I reserve all my Chinese and Japanese stuff to MacOS X which makes it a
lot easier for me not to have HUGE fonts on Windows.
For those of you using MacOS X (or will be in the future), things are much
easier and I can help you in that arena much more.
Now for inputting Unicode data and storing it in either pure Unicode or
UTF-8, I can explain what should happen (and what does happen in OPENSTEP),
but on Windows I do not know. Bascially, the requirement is to be able to
create a text file that has Unicode data in it. One needs an "input
manager" that allows one to type in data that stored as Unicode. One needs
a text editor that handles Unicode data. And one needs a mechanism where
one can actually see that the data entered is proper. In other words, the
text editor needs to be able to show the characters you just typed in the
proper fonts.
On MacOS X the TextEdit program handles Unicode data and can store it in
either Unicode or UTF-8 formats. And the rulebook, font and input manager
on my web page fill in the missing pieces. With the input manager, one
switches the keyboard to generate the Unicode values for peep, tot, kick,
etc. and one can switch it back to Latin to generate a, b, c, etc.
I would imagine one would need a text editor for Windows which handles
Unicode data. I am sure this should be available especially from a good
university specializing in language acquisition or linguistics. Assuming it
can be found and configured to use the fonts the user wants, one could load
the proper Shavian+Arial font to use. Then one needs a mechanism to input
the Shavian characters at their Unicode values. A good text editor may
allow you to define a custom input mechanism or perhaps offer extensions
that allow you to write one. If anyone has any suggestions as to what would
be a good editor, I can always help people make modifications to allow
Shavian Unicode to be written.
If we started using UTF-8 pages we would have the benefit of being able to
write both Latin and Shavian in the same page without nasty font switching
and also giving the freedom to the user to use whatever set of fonts the
user wants.
--Scott
------------------------------------------------------------------------
Want to send money instantly to anyone, anywhere, anytime?
You can today at X.com - and we'll give you $20 to try it! Sign
up today at X.com. It's quick, free, & there's no obligation!
http://click.egroups.com/1/332/4/_/54531/_/945707165
-- Create a poll/survey for your group!
-- http://www.egroups.com/vote?listname=shavian&m=1
From: Philip Newton
Date: 1999-12-20 16:37:02 #
Subject: [shavian] Re: Data formats for Shavian
Toggle Shavian
> To view the UTF-8 pages on my web site [...]
Er, where was your web site again?
Cheers,
Philip
------------------------------------------------------------------------
Want to send money instantly to anyone, anywhere, anytime?
You can today at X.com - and we'll give you $20 to try it! Sign
up today at X.com. It's quick, free, & there's no obligation!
http://click.egroups.com/1/332/4/_/54531/_/945707820
eGroups.com Home: http://www.egroups.com/group/shavian/
http://www.egroups.com - Simplifying group communications
From: Scott Harrison
Date: 1999-12-20 17:15:43 #
Subject: [shavian] Re: Data formats for Shavian
Toggle Shavian
In a message from Philip Newton <philip.newton@...>
dated Mon, 20 Dec 1999 17:36:22 +0100, my mailer made me see:
-> > To view the UTF-8 pages on my web site [...]
->
-> Er, where was your web site again?
->
http://www.mithrandir.com then follow the links to Software and Shavian.
--Scott
------------------------------------------------------------------------
Want to send money instantly to anyone, anywhere, anytime?
You can today at X.com - and we'll give you $20 to try it! Sign
up today at X.com. It's quick, free, & there's no obligation!
http://click.egroups.com/1/332/4/_/54531/_/945710140
-- Check out your group's private Chat room
-- http://www.egroups.com/ChatPage?listName=shavian&m=1
From: Scott Harrison
Date: 1999-12-20 17:58:45 #
Subject: [shavian] Re: More homographs
Toggle Shavian
In a message from TomLeGeyt@...
dated Sun, 19 Dec 1999 00:29:39 -0500 (EST), my mailer made me see:
-> F hAv bIgun wxkiN on a /SEvWn t /rOmAn trAnzlitxEtP but hAv run intM pxoblemz
-> wiH hOmOgrAfs. hQ kAn F pRs a sentens suc Az "F alwEz bF mF pXz in pXz."?
->
-> v kPs, HXz a hOl rEnJ v [hOmOgrAfs] wen V stRt TiNkiN abQt it; wxdz HAt R
-> prOnQnsd H sEm but speld difxentlI. HX wut Fm konsxnd wiH. enI sugJeScunz?
->
-> if enIwun iz intxested, F hAv a wE t displE /SEvWn n /rOmAn letxz
-> sFmultEnIuslI in DOS! His mFt bI v Vs t sumwun VziN An Oldx kumpVtx. it
-> kunsists v a siNgul eksekVtabl fFul HAt prOvFdz bOT Alfubets.
->
-> .tom
->
The problem you are running into is not an easy problem to solve (assuming you mean attempting to determine which English word should be used - pears or pairs). You need to be able to analyze the statement and determine from context what is happening. This type of problem has been solved to a limited extent but using specific domains of knowledge. For example, the SYSTRANS program can do a pretty good job as long as you specify a specific technology like aeronautical engineering. However, if you are doing something like poetry you are going to run into many more problems.
As for suggestions, there are none really except alter English so it has no more words sounding the same. Actually, if I were really to make a suggestion I would suggest learning Esperanto and promote that instead.
One of my fields of interest is computational linguistics. My goal is to create an intermediate language that can be used to translate from one natural language to another. For example, English to Russian would be accomplished by translating the English into my language and then my language into Russian. The same would apply in reverse. Note that translating a single word may be easy in some cases but not in others because of meaning within a phrase, etc. The same types of techniques for context analysis for this type of translation can occur when you have things written in Shavian. Basically, you already do the same for the word "read." Look at the sentence: "I read the book." In English most people will think this is past tense. But it can be present tense. Hard to tell without more context. If one has a question before hand: "What did you do yesterday?" or "What do you do each Monday?" you have context and can better determine what "I read the book." means. Therefore, unless you are willing to write a very good context analyzer you are giong to hit a wall. Sorry.
--Scott
------------------------------------------------------------------------
Want to send money instantly to anyone, anywhere, anytime?
You can today at X.com - and we'll give you $20 to try it! Sign
up today at X.com. It's quick, free, & there's no obligation!
http://click.egroups.com/1/332/4/_/54531/_/945712720
-- Easily schedule meetings and events using the group calendar!
-- http://www.egroups.com/cal?listname=shavian&m=1
From: Simon Barne
Date: 1999-12-22 02:24:31 #
Subject: [shavian] Ghoti Fingers - the talking typewriter
Toggle Shavian
Has nobody else discovered the joys of this thing? Apart from the obvious
pleasure of getting Lionel Ghoti's voice to say rude words, it makes
transcription much quicker and easier. You hit a key hoping for an "ah",
L.G. says "awe" and you know you've got it wrong.
A merry Christmas, happy New Year and peaceful Idd ul-Fitr to all co-cranks.
By the way, shouldn't we be promoting Shavian as the Millennium Alphabet?
------------------------------------------------------------------------
The only fruitcake at overstock.com is our manager.
He’s giving away a $20 coupon, plus our everyday Free Shipping.
Take advantage of the savings and selection now.
http://click.egroups.com/1/342/4/_/54531/_/945829468
-- 20 megs of disk space in your group's Document Vault
-- http://www.egroups.com/docvault/shavian/?m=1
From: Hal Fulton
Date: 1999-12-22 22:09:13 #
Subject: [shavian] Unicode - some comments
Toggle Shavian
Hi all,
I am very much a believer in Unicode, so I am glad to see things
being discussed here. However, there's a bit of a problem in that
virtually no one really uses Unicode yet. (Please, prove me wrong.)
And I've NEVER heard of anyone implementing ALL of Unicode -- all
the existing implementations I've heard of handle some subset of it.
Also, a small point that will hearten some of you.
It's a minor misconception that Unicode can only handle 65,536
characters. If that were so, you can bet there wouldn't be any
conscript stuff in there like Shavian or Klingon!
There's a trick that I forget the details of -- sort of like a
code page shift or something -- where you can switch to a different
"plane" (I think that's the term) and have basically a whole new
character set at your disposal. So the "simple" Unicode characters
number 65,000 or so, but the "very rare" ones can potentially
number -- what, close to a million?
Forgive me for being so vague. You can look up the details. I assure
you the core of what I'm saying is true.
Incidentally, it's done this way because of the immense diversity
of the world's scripts. From what I understand, they "cheated" by
including the common characters of Japanese, Chinese, and Korean
only once rather than three times, and still don't have enough room
for ALL the characters.
And that, of course, is why Unicode is so bloody huge.
Also, it's perhaps worthwhile to remember that the addition of Shavian
is only "proposed" -- isn't that correct? So while there are code points
assigned, which presumably no one can use for something else, they
aren't
"official" yet, are they?
Hal
------------------------------------------------------------------------
For the fastest and easiest way to backup your files and, access them from
anywhere. Try @backup Free for 30 days. Click here for a chance to win a
digital camera.
http://click.egroups.com/1/337/4/_/54531/_/945900550
-- Talk to your group with your own voice!
-- http://www.egroups.com/VoiceChatPage?listName=shavian&m=1
From: Hal Fulton
Date: 1999-12-22 22:11:47 #
Subject: [shavian] Can anyone think of a three-way homograph?
Toggle Shavian
Ever since the subject of homographs was brought up, I have
been trying to think of a written word that could be pronounced
more than two ways.
There's a tickling in the back of my mind that says I used to
know one. But that could be my imagination.
Can anyone think of one?
Hal
------------------------------------------------------------------------
GRAB THE GATOR! FREE SOFTWARE DOES ALL THE TYPING FOR YOU!
Gator fills in forms and remembers passwords with NO TYPING at over
100,000 web sites! Get $100 in coupons for trying Gator!
http://click.egroups.com/1/340/4/_/54531/_/945900705
-- Create a poll/survey for your group!
-- http://www.egroups.com/vote?listname=shavian&m=1