Sunday, November 06, 2005

Kombination

You may remember a past blog (http://tsdgeos.blogspot.com/2005/09/some-gaming.html) about two games i was writing for KDE4. Kiriki has not changed anything since it can be said to be finished, but crossedWords changed its name to Kombination (damn K-names! ;-)) and has progressed a lot as you can see here thanks to Pino Toscano helping with it. You can actually play with the exception that scoring does not work, blank tile can not be changed to a letter and that word checking does not work.

And word checking is our largest problem by now. With word checking i mean "how to decide if a word is valid or not".
The first obvious thing one thinks is "ask the other users", but you have to think as a game designer that other players can be "bad people" and always refuse your words so that solution has this problem.
Second obvious thing is "use kspell", but this has problems as for example in spanish you play with unaccentuaded tiles, so you do not create "balcón" but "balcon" so kspell will tell you that word is invalid.
Third obvious thing is "use a preprocessed list only with acceptable words even if they are written wrong", that solution also has problems, italian word list extracted from aspell is 25Mb, spanish one is 6.5Mb and catalan 149Mb!!! that means that loading that into memory to check needs a lot of mem and that kombination would be HUGE to distribute.

Anyone has a fourth idea?

BTW you can get kombination from /branches/work/kde4/playground/games/kombination/

12 comments:

  1. Well, I think your word lists would turn out to be quite a bit smaller, actually. I for one have a separate book that came with my dictionary ("Van Dale Groot Woordenboek der Nederlandse Taal" - the de facto standard Dutch dictionary) that is made especially for this game. The list of words in that book is the official list of accepted words by the Dutch and Flemmisch Scrabble League (yes, there is such a thing). This list is WAY shorter than the complete list of words you'd see in a dictionary. You might even be able to get a list of accepted words from your local scrabble league.

    ReplyDelete
  2. still use a/ispell, and see if the suggested words contain letter that can be mapped to vallid letters for the word, example:
    f({ô,ö,\`o,\'o})->o, and compare that again

    ReplyDelete
  3. Use kspell, but keep a list of [accented] characters that can be represented but kombination's. Then chech all combination (normally very few) with kspell.

    I did something similar for htdig, it's easy for spanish and catalan, there are few combinations.

    The equivalence table (only for those special characters) could be stored in an utf-8 text file for evey languaje, as part of their i18n package.

    Something like the following (which is valid for catalan and spanish):

    à,á: a
    è,é: e
    í,ï: i
    ò,ó: o
    ü,ú: u

    Of course, you should be carefuil to accept two and three character length unique symbols, like:

    l·l: ll

    ReplyDelete
  4. To andré:

    Well, the problem with using that kind of word list are:
    - Does it include all the plurals and conjugated verbal forms? Remember you as a person can think on them but a computer needs to have them all
    - Does it allow free distribution? I doubt it

    ReplyDelete
  5. To anon and Ricardo:

    That may be a good solution, i'll try to investigate it a bit further.

    ReplyDelete
  6. Hey, why not implement the JDuplicate protocol so that you can play online and much more? :)

    http://jduplicate.sourceforge.net/documentation_EN.html

    ReplyDelete
  7. As in some natural language processing system, the dictionary could be stored as a tree (or even better as a graph) and precessed like an automaton. Reaccentuation can be done dynamically as stated in other comments. This way, dictionaries can be very efficiently compressed.
    In fact, NLP tools could be very usefull for KDE as a whole... There already exist open source solutions, I think.

    ReplyDelete
  8. Sorry, I should add some more comments: I think that your game only needs lemmas (simple words whithout conjugations, plurals, etc). Even if it is not case, full form dictionaries can be very efficiently stored as I stated above. But it's true that they will represent a few MB each and thus should be shared by the full KDE system.

    ReplyDelete
  9. Why would i not need plurals? Plurals are completely valid words.

    BTW writting a NLP system for Kombination is VERY out of the scope of the project ;-)

    ReplyDelete
  10. Why not use combine your two methods and have ASpell with the option for the users to wave ASpell's decisions, thus you could have three profiles: ASpell strict scrabble, user moderated scrabble or ASpell scrabble with the ability for players to confirm non-ASpell words.

    Seth Quarrier

    ReplyDelete
  11. > crossedWords changed its name to Kombination (damn K-names! ;-))

    Doh!

    Should have called it "Qwidgybo"

    ReplyDelete
  12. Regarding duplicate: No, please don't, Duplicate Scrabble is a far inferior game to Scrabble.

    exciting project! I've mailed you aacid with some comments and suggestions.

    cheers
    Jason (11th ranked Scrabble player in North America)

    ReplyDelete