[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: PlusPack, Dictionaries, Phonemes and Russian



> A bug with the dictionary compiler, which was discussed a few days earlier
> appeared to survive the last fix somehow, since the dictionary compiler
> still doesn't show itself when I start it on Windows 98, despite the
> installation succeed without hang this time. Temporarily I have left the
> idea to compile dictionaries under Win98 and switched this task to Win2k.

I'm afraid that its not a bug.  The dictionary compiler only works on
NT-based systems.  This is currently by design...  might be nice to change
in the future, but there are currently many more pressing issues to work on.

> There were no problem and a compiler started successfully. I must admit
that
> you did a great job with compiler. It's fast and it decreases a size even
> better then WinZip. So Michael, when you finish writing a speller, you can
> make a new extremely competitive archiver :-)

LOL...  at least for text files....  :-)


> I could compile two Russian dictionaries using the word lists, which I had
> told you about in one of my previous messages. They are:
>  * russian.adm -- ~147 000 words (435 Kb)
>    http://addict.ghcube.com/russian.zip
>  * ru_phys.adm -- ~ 16 000 words ( 86 Kb)
>    http://addict.ghcube.com/ru_phys.zip
> The first one contains a general vocabulary and the second is loaded with
> physics and math words.

Glad to see these come on board...  Glenn will see to it that they are
placed on the dictionary site when you think they're ready to release.  :-)


> When I compiled these dictionaries and tryed to test their efficiency, a
> brand-new bug crop out. It is vividly depicted at
> http://addict.ghcube.com/phonetic_bug.gif. As you can see the spelling
> dialog partly appears and doesn't react on any user's actions, neither
does
> the host application. The system monitor shows 100% CPU load. I assumed
that
> spell checker is unable to form a suggestions list and some kind of
infinite
> loop takes place.

Hmmm... interesting.  Based upon past experience in this area, I doubt
Addict has an infinite loop in it, but some operation appears to be taking
an enormous amount of time...

I'd like to try to reproduce your bug if I can...  so what I'd like from you
is:

The exact text you're checking... preferably limit it down to the word that
you're having it lock up on...
The dictionaries that are loaded... so I can repro your environment...
The values for all of the Phonetic properties (were you using the defaults,
or does it require some change)?
How fast is the machine you're on?


> I was consecutively enabling/disabling different features until I turned
> phonetic suggestions off. (I had doubts this could work for  Russian
anyway)
> The problem disappeared. The decrease of PhoneticDepth, PhoneticDivisor
and
> PhoneticMaxDistance settings to 1, 1 and 2 correspondingly solves the
> problem as well, but the delay of suggestion list is still noticeable.

Once you send me the info from above, I'll take that and try to determine
what the best course of action is.  If the code is correct and its just
taking a very long time to do, I'll likely add a property that limits the
length of time it can spend generating phonetic suggestions.


> 4. Reliability Testing.
>
> I only tested russian.adm and found that it is able to satisfy the average
> spelling needs, although several problems with geographical titles, like
> names of countries or rivers were obvious. The other main problem deals
with
> the notorious forms of words, the overwhelming majority of which does
> present in a dictionary while there are some forms that were not included.

Glenn strives to do routine updates of dictionaries... some are a constant
work in progress...  though, in truth, most languages are works in
progress....

> It leads to the situation, when a word, being deliberately good, is
> recognized as misspelled and a checker suggests replacing it with one of
> other forms of a same word. But this problem is not too dramatic since
there
> were not many cases of this kind. Even MS Word tends to have this problem,
> in less extent.

And the better the dictionary gets, the fewer of these problems it should
have over time....


> 5. Authorship.
>
> As I mentioned earlier, a person who "keeps rolling" these dictionaries is
> Lev Melnikovsky. He kindly allowed it to be freely used in any way. In
> accompanying note he says: "Feel free to copy, use, distribute, abuse or
> delete this dictionary in any way you wish". Thanks, did just that :-) But
> still, in the readme.txt file with the dictionary, it would be better to
> mention the original word list, which the dictionary is based on. It will
> help to avoid confusion that our company developed the whole stuff.

Of course...


Best Wishes,

Michael Novak
Addictive Software
address@hidden
http://www.addictive.net




Please visit Addict Home Site.