Solving Toki Pona Hangman

Nov 27, 2021 toki pona hangman proofs

This proof will weakly solve toki pona hangman by showing an winning strategy that allows a guesser to always win from the starting position of the game. This strategy is simple enough that a motivated enough person could memorize it.

We will also prove that an honest game of toki pona hangman cannot be strongly solved, but a game with dishonest players is strongly solved.

For the purpose of this exercise, only the 137 “essential” nimi pu and nimi ku suli are considered. See this word list.

The algorithm presented in this proof was implemented and ran against each word. The results are included in the appendix.

Motivation

The motivation for this is simple. Toki pona is a constructed language with 137 “official” words and only 14 letters, 5 of which are vowels. This means that toki pona words have significant overlap with each other spelling-wise. This makes even naive strategies effective at hangman. It seemed likely from this that there would be a winning strategy which is simple enough that a person could memorize it, so I decided to try to find it.

Also, I was bored after work one day.

Defining hangman

Hangman is a guessing game in which one player (the “executioner”) decides a word in advanced and gives another player (the “guesser”) a number a blank tiles equal to the number of letters in the words. The guesser then picks a letter. The executioner will then reveal all the positions of that letter in the word, or will notify the guesser that the letter is not in the word. If the letter is not in the word, the guess is counted as a mistake. This guessing process repeats until either the guesser has guessed the full word, or the guesser has made some set number of mistakes.

Because hangman is often played where the mistakes are counted as parts of a stick figure being hung from a gallows, the number of mistakes is often 6: 1 for the head, 1 for the torso, 2 for the arms, and 2 for the legs. Thus, we will assume we are playing a game of hangman where the guesser loses on the sixth mistake.

Note that we are assuming the executioner is choosing a single word. In some versions of hangman, entire phrases are used. These versions are outside the scope of this proof.

We are also assuming that no cheating is occurring. We will call these games “honest” games. Games in which one or more player is cheating are considered later and referred to as “dishonest” games.

Weakly solving

We will now weakly solve honest toki pona hangman by providing an algorithm that will allow the guesser to win from the starting position regardless of executioner’s word choice. We will break this up into cases based on the length of the word.

Trivial Cases

There are several word lengths which are trivial to win:

Single character words

If the word is a single character, simply guess “a”, “e”, “o”, and “n” in any non-repeating order. Worst case, the word is guessed after 3 mistakes, as the only single character words are “a” “e” “o” and “n”
7 character words

The only 7 character words in nimi pu and nimi ku suli are “kepeken” “sitelen” and “monsuta.” Simply guess “n” then “e” if it isn’t “monsuta” and the solution is known with 0 mistakes. (note: initial publication of this proof used a less efficient strategy here. thanks to /u/NotchMath for pointing out the improvement)
8 character words

The only 8 character words are “kokosila” and “misikeke.” Pick any letter in either word, and once that letter is filled in, the solution is known. Avoid picking “m” or “a” as the first guess as those letters are not in both words, and this can be solved with 0 mistakes.
kijetesantakalu

“kijetesantakalu” is the only word of its length, so if the word is 15 letters long, the solution is immediately known. Worse case: 0 mistakes.

Non-Trivial Cases

2 character words

The following are all of the 2 character words: “li” “mi” “ni” “pi” “en” “jo” “ko” “la” “ma” “mu” “pu” “ku” “tu”

No matter the word, the following algorithm will reach the solution with at most 4 mistakes:

pick "i":
if i is correct:
    pick "l" "m" "n" then "p" - Worst case: Solution is known after 3 mistakes.

otherwise, pick "u":
if u is correct:
    pick "m" "p" "t" "k" - Worst case: solution is known after 4 mistakes.

otherwise, pick "a":
if a is correct: 
    pick "l" or "m" - Worst case: solution is known after 3 mistakes.

otherwise, pick "o":
if o is correct:
    pick "j" or "k" - Worst case: solution is known after 4 mistakes.

otherwise, solution is "en" - solution is known after 4 mistakes.

3 character words

The following algorithm will reach the solution with at most 4 mistakes:

pick "a":
if a is correct:
    case: a_a: word is "ala" solution known in 0 mistakes
    case: a__: word is "ale" "ali" or "anu" - guess "l" then "i" or "n" and the solution is known in at most 1 mistake
    case: _a_: word is in ["wan" "tan" "pan" "jan"] - guess "w" "t" "p" "j" and the solution is known in at most 3 mistakes.
    case: __a: word is uta or ona - guess "u" or "n" and the solution is known in at most 1 mistake.

otherwise, pick "o":
if o is correct:
    case: o_o: word is oko. Solution is known ith 1 mistake.
    case: _o_: word is "kon" or "lon" - guess "l" or "k" and the solution is known in at most 2 mistakes.
    case: __o: word is "ijo" or "ilo" - guess "l" or "o" and the solution is known in at most 2 mistakes.

otherwise, pick "i":
if i is correct:
    case: i__: word is "ike" - solution known with 2 mistakes.
    case: _i_: word is "kin" or "sin" - guess "k" or "s" and the solution is known in at most 3 mistakes.

otherwise, pick "e":
if e is correct:
    word is either "ken" or "len" - guess "k" or "l" and the solution is known in at most 4 mistakes.

otherwise, solution is "mun" and known with 4 mistakes.

4 character words

The following algorithm will reach the solution with at most 5 mistakes:

pick "a":
if a is correct:
    case: a___: word is "ante" or "awen" - guess "n" and the solution is known in 0 mistakes.
    case: a__a: word is "anpa" - the solution is known in 0 mistakes.
    case: _a__: word is in ["kasi" "lape" "laso" "jaki" "mani" "pali" "taso" "walo" "waso"]
        pick "s":
        if s is correct:  
            pick "o" 
            if o is correct:
                word is in ["laso" "taso" "waso"] - pick "l" "t" and "w" and the solution is known in at most 2 mistakes.
            otherwise, solution is "kasi" and known in 1 mistakes

        otherwise, word is in ["lape" "jaki" "mani" "pali" "walo"]
        pick "i"
        if i is correct:
            word is in ["jaki" "mani" "pali"] - pick "j" "m" "p" and solution is known at most 3 mistakes.

        otherwise, word is in ["lape", "walo"] - pick "l" and the solution is known in 2 mistakes.

    case: ___a: word is in [ "insa" "luka" "lupa" "nena" "noka" "poka" "pona" "sina" "sona" "supa" "unpa" "weka" ]
        pick "n":
        if n is correct:
            case: n__a: word is "noka" - solution is known in 0 mistakes.
            case: _n_a: word is "insa" or "unpa" - guess "i" "u" and solution is known in at most 1 mistake.
            case: __na: word is in ["pona" "sona" "sina"] - guess "p", "o" and the solution is known in at most 2 mistakes.
            case: n_na: word is "nena" - solution is known in 0 mistakes.

        otherwise, the word is in ["luka" "supa" "lupa" "weka" "poka" ] - guess "k" and the solution is known in at most 3 mistakes.

    case: _a_a: 
        word is in [ "kala" "kama" "lawa" "mama" "nasa" "pana" "sama" "tawa" "wawa" ]
        pick "m"
        if m is correct:
            case: mama: solution is known in 0 mistakes.
            case: _ama: word is either "sama" or "kama" - guess "s" "k" and solution is known in at most one mistake.

        otherwise, pick "w"
        if w is correct:
            case: wawa: soluction is known in 1 mistake.
            case: _awa: word is either "lawa" or "tawa" - guess "l" "t" and solution is known in at most 2 mistakes.

        otherwise, word is in ["kala" "nasa" "pana"] - pick "k" "n" "p" - and solution is known in at most 4 mistakes.

otherwise, pick "e":
if e is correct:
    case: e___: word is "esun" - solution is known with 1 mistake.
    case: _e__: word is in ["jelo" "meli" "seli" "selo" "sewi" "telo" "leko" "meso"]
        pick "l"
        if l is correct:
            case: _el_: word is in [ "jelo" "meli" "seli" "selo" "telo" ]
               pick "o"
               if o is correct:
                   word is in "jelo" "selo" "telo" - pick "j" "t" "l" and the solution is known in at worst 3 mistakes.
                otherwise, word is "meli" or "seli" - pick "s" or "m" and the solution is known in at worst 3 mistakes.
            case: le__: word is leko, solution is known with 1 mistake.
        otherwise, word is "sewi" or "meso" - guess "s" and the solution is known with 2 mistakes.
    case: __e_: word is "open" - solution is known with 1 mistake.
    case: ___e: word is in [ "kule" "kute" "loje" "mije" "mute" "sike" "wile" ]
        pick "k"
        if k is correct: 
            case: "k__e": word is  "kule" or "kute" - guess "l" "t" and solution is known in at most 2 mistakes.
            case: "__ke": word is "sike" - solution is known with 1 mistake.
        otherwise, word is in [ "loje" "mije" "mute" "wile" ], pick "l"
        if l is correct:
            case: l__e: word is "loje" - solution is known with 2 mistakes.
            case: __le: word is "wile" - solution is known with 2 mistakes.
        otherwise, word is "mije" or "mute" - pick "j" "t" and the solution is known in at most 4 mistakes.
    case: _e_e:
       word is either lete or seme, pick "l" "s" and the solution is known in at most 2 mistakes.

otherwise, pick "o"
if o is correct:
    case: o___: word is olin, solution is known in 2 mistakes
    case: _o__: word is in ["moku" "moli" "poki" "toki" ]
        pick "k"
        if k is correct:
            word is in "moku" "poki" "toki"
            pick "i"
            if i is correct:
                word is "poki" or "toki" - pick "p" "t" and solution is known in at most 3 mistakes.
            otherwise, word is "moku" - solution known in 3 mistakes.

        otherwise, word is "moli" - solution is known in 3 mistakes.
            
    case: _o_o: word is "tomo" or "soko" - guess "t" "s" and solution is known in at most 3 mistakes.
    case: ___o: word is "suno" - solution is known with 2 mistakes.

otherwise, pick "i"
"i" will be correct, because there are no 4 letter words with no a, e, o, or i.
case: _i__: the word is "lipu" - solution is known with 3 mistakes.
case: ___i: word is in [ "musi" "suli" "suwi" ] 
    pick "s":
        case: __si: word is "musi", solution is found with 3 mistakes.
        case: s__i: word is either "suwi" or "suli" - guess "l" "w" and the solution is known in at most 4 mistakes.
case: _i_i:
   pick "l":
   if l is correct:
       case: _ili: word is "kili" - solution is known in 3 mistakes.
       case: lili: word is "lili" - solution is known in 3 mistakes.

   otherwise, pick "p"
   if p is correct:
       case: pi_i: word is "pini" - solution is known in 4 mistakes.
       case: pipi: word is "pipi" - solution is known in 4 mistakes.

   otherwise, solution is "nimi" - solution is known in 5 mistakes

5 character words

The following algorithm will reach the solution with at most 2 mistakes:

pick "n":
if n is correct:
    case: ____n: word is in ["kiwen" "pilin" "lukin"] - guess "i" and the solution is known in 0 mistakes.
    case: __n__: word is in ["linja" "monsi" "tenpo" "tonsi"]:
        pick "o"
        if o is correct:
            case: _on__: word is "tonsi" or "monsi" - guess "t" or "m" and solution is known in at most 1 mistake.
            case: __n_o: word is "tenpo" - solution is known with 0 mistakes.
        otherwise, solution is "linja" - solution is known with 1 mistake.
    case: n_n__: word is "nanpa" - solution is known with 0 mistakes.
    case: n___n: word is "nasin" - solution is known with 0 mistakes.

otherwise, pick "a"
if a is correct:
    case: a____: word is "akesi" - solution is known with 1 mistake.
    case: a_a_a: word is "alasa" - solution is known with 1 mistake.
    case: __a_a: word is "utala" - solution is known with 1 mistake.

otherwise, word is "epiku" - solution is known with 2 mistakes.

6 character words:

The following algorithm will reach the solution with at most 2 mistakes:

pick "a":
if a is correct:
    case: _____a: word is "pimeja" - solution known in 0 mistakes.
    case: _a_a__: word is "namako" - solution known in 0 mistakes.
    case: _a__a_: word is "lanpan" - solution known in 0 mistakes.
    case: _a___a: word is "palisa" or "jasima" - guess "s" and the solution is known in 0 mistakes.
    case: _a_a_a: word is "pakala" or "kalama" - guess "k" and the solution is known in 0 mistakes.

otherwise, pick "i":
if i is corect
    case: _i____: word is "sijelo" - solution known with 1 mistake.
    case: _____i: word is "soweli" - solution known with 1 mistake.
    case: _i__i_: word is "sinpin" - solution known with 1 mistake.
    case: _i_i_i: word is "kipisi" - solution known with 1 mistake.

otherwise, the word is "kulupu" - solution is known with 2 mistakes.

Proving the honest game cannot be strongly solved

We have proved thus far that we can win any game of toki pona hangman from the starting position as the guesser. This weakly solves the game.

To strongly solve the game, we must prove that an algorithm exists that can determine the outcome of the game from any given position given perfect play from that position onward, including positions in which sub-optimal strategy has lead to mistakes. We will now prove that there is no such algorithm.

Consider the following game state: The guesser has made any 5 mistakes, and there are at least two valid words for which the remaining unknown letters differ. Even assuming the guesser guesses optimally, we cannot know whether or not the guess will be correct. They may get lucky and guess from the correct word and win the game, or they may be unlucky and guess from the incorrect word and lose the game. Neither case can be determined from this position.

As a concrete example, consider the following game state: the executioner chooses “e” for the word. The guesser guesses “w” “l” “p” “n” and “o” (recall that a strongly solved game must be solvable for any possible position - even ones in which one player has played poorly). The guesser has two options available that don’t lead to a guaranteed loss: “a” and “e” as those are the only remaining one character words. Thus, guessing either “a” or “e” is obviously the optimal strategy. This gives the guesser a 50% chance of winning, and prevents us from knowing the outcome of the game from this position.

Thus, the game cannot be strongly solved.

What about with cheaters?

Thus far we have assumed that neither player is cheating. Let us now show that the game can be strongly solved for instances where both players can cheat. We will call these games “dishonest games.”

Let us assume that for dishonest players the following cases exist:

case 1: the guesser is somehow able to learn the entire word at any point in the game
case 2: the executioner can change the word at any point as long as the change doesn’t invalidate previous hints or imply the solution is a nonexistent word.
case 3: both case 1 and 2 are true.

Case 1 is trivially solved because if the guesser knows the word, they simply guess the letters from the word, and thus the outcome is always a win for the guesser.

Case 2 and case 3 are equivalent: even if the guesser learns the word at one point in time, the executioner can simply change it immediately after. Thus the guesser is never truly able to learn which word the executioner is going to claim to be the real word, and the guesser effectively is unable to cheat.

Let us show that for case 2 and case 3, it is weakly solved for the guesser by the same strategy as honest games:

Weakly solving dishonest games

Recall that a game is weakly solved if the outcome from the starting position is known given perfect play from the players. Recall that for a dishonest executioner, we assume the executioner can change the word as many times as they like to whatever they like as long as it does not invalidate previous hints or imply the existence of a non-word.

Because the game is weakly solved for the guesser without cheating, and because the executioner cannot change the word in such a way that the hints imply a non-existent word, the guesser should simply follow the strategy from an honest game and they will always win.

Proving dishonest games can be strongly solved

Let us assume that there exists some perfect strategy for play from any given position (we will prove the existence of this strategy later, but for now let us just show that given some perfect strategy, dishonest games can be strongly solved).

Recall that a game is strongly solved if the outcome from any position is known given perfect play from the players. Recall that for a dishonest executioner, we assume the executioner can change the word as many times as they like to whatever they like as long as it does not invalidate previous hints or imply the existence of a non-word. Recall that for a dishonest guesser, we assume they are able to learn the word at any point in the game.

Consider the following two game states:

The guesser has made 5 mistakes and there are at least two valid possible words remaining that have differing letters for each of the remaining unknown letters.

In the case, the executioner can force the guesser to lose. Regardless of whether or not the guesser cheats, the guesser must either guess a letter which isn’t in the solution, in which case the game is lost as normal, or the guesser guesses a letter which is in the solution, in which case the dishonest executioner simply switches to one of the other valid words and the guesser loses.
The guesser has made 5 mistakes and there is only one possible word remaining.

In this case, the guesser simply guesses letters from the only possible remaining word. They can force a win, as the executioner has no words they can switch to without invalidating previous hints or implying the existence of a non-word.

These two game states are sufficient to prove that the game can be strongly solved: Consider any arbitrary position where the guesser has made less than 5 mistakes. The guesser must simply guess with perfect strategy until they’ve made 5 mistakes, at which point they are in one of the above game states. If the guesser is unable to reach 5 mistakes with perfect strategy from the given position, then it is also impossible to reach 6 mistakes, thus the position is a winning position for the guesser.

Thus far we have assumed a perfect strategy exists for dishonest games. Let us now prove that such a strategy exists.

A perfect strategy for dishonest games

For the guesser to win from any given position, the guesser must be able to guarantee that the executioner cannot change the word in ways that could force more mistakes than remain for the guesser. Otherwise, the executioner can force a win. For example, if the guesser has made 4 mistakes, and there exists no way to guarantee the solution is known within only one mistake, the guesser will lose as even if the guesser guesses correctly, the executioner can switch words to force the guesser to lose.

For example, consider the following position: the word is “mu,” hint is “__,” and the guesser has guessed “e” “n” “j” and “k.” The executioner can force the guesser to lose: if the guesser chooses “m” the executioner switches the word to “pu” and claims a mistake was made. If the guesser then guesses “u” then the executioner switches to “pi” and the guesser loses. If the guesser originally guessed “u” instead of “m”, the executioner changes the word to “ma” and claims a mistake was made. If the guesser then guesses “a” the executioner claims the word is “mi” and the guesser loses. If the guesser guesses a letter other than “m” or “u” then the executioner doesn’t switch the word and the guesser is still forced into a loss.

This can be easily bruteforced by looking at the current position, then trying every possible guessing sequence for the position and recording the maximum mistakes possible by each guessing sequence. If there is a guessing sequence for which the maximum possible mistakes is less than the allowable mistakes remaining, the game is a win for the guesser. Otherwise, the game is a win the executioner.

Thus, dishonest toki pona hangman is strongly solved. Additionally, this finding is generic enough that it is not specific to toki pona and in fact shows that dishonest hangman is strongly solved for any set of words.

Conclusion

In this proof we have done the following:

Weakly solved toki pona hangman by presenting an algorithm that will make at most 5 mistakes when guessing any toki pona word in a game of hangman from the starting position.
Proven that an honest game of toki pona hangman cannot be strongly solved.
Strongly solved dishonest games of toki pona hangman, and in doing so, strongly solved dishonest games of hangman for any language.

That’s pretty cool, I guess.

Future Work

This proof has shown that only 5 mistakes are needed to win hangman in toki pona. However, it may be possible to solve it in only 4 mistakes. Future work could entail finding the solution with minimum mistakes.

The toki pona language will change over time. Future work could entail adapting the solution for new words. It may also be the case that future additions to the language could prevent it from being weakly solved - for example, “ju” “lu” “nu” and “su” are reserved by jan Sonja for future use. If just three of these words are added to the language, the game becomes unsolvable.

The weak solving strategy has programmatically applied to games from the starting position for each of the 137 words (see the appendix). Future work may entail doing the same for dishonest players from every possible position.

Appendix

The weak solving approach has been implemented into a program and used on all 137 words. The results for each word are linked below:

a e o n en jo ko la li ma mi ku mu ni pi pu tu ala ale anu ijo ike ilo jan ken kin kon len lon mun oko ona pan sin tan uta wan anpa ante awen esun insa jaki jelo kala kama kasi kili kule kute lape laso lawa lete lili lipu loje luka lupa mama mani meli mije moku moli musi mute nasa nena nimi noka olin open pali pana pini pipi poka poki pona sama seli selo seme sewi sike sina sona suli suno supa suwi taso tawa telo toki tomo unpa walo waso wawa weka wile leko soko meso akesi alasa kiwen linja lukin monsi nanpa nasin pilin tenpo utala tonsi epiku kalama kulupu namako pakala palisa pimeja sijelo sinpin soweli jasima lanpan kepeken sitelen kipisi monsuta kokosila misikeke kijetesantakalu