SimSearch is a visual search-by-similarity for Japanese kanji. Suppose you encounter a kanji you don't know, but it looks very similar to one you do know. In this case, you can enter in the one you know as a query, and navigate through the similarity space until you find the kanji you are looking for.
In order to determine how similar two kanji are to one another, we use the stroke edit distance between the two kanji [1]. In other words, we look at the sequence of strokes used to write each character, and determine how many changes you'd need to make to turn one kanji's series of strokes into that of the other kanji. Research so far has that this measure best matches human judgements of similarity [2].
Secondly, search is also adaptive. That is, it will adapt to whatever people actually find similar when they do searches using the system. For this, we use Q-learning [3], a well known algorithm for learning the best action to take in a state space. Q-learning has been used in a wide variety of search applications, including game players for board games which learn from experience.
Fortunately, systems such as this do not need their own custom dictionaries of Japanese. For kanji translations and pronunciation, we use the excellent and free Kanjidic dictionary [4], and for layout we use RaphaelJS [5].
If you're a learner of Japanese, or a native speaker, please give the system a try, and send me some feedback. If you're a programmer, note that SimSearch is open source, so feel free to suggest new improvements, or even try running your own site.
Visual kanji search (r77:6a5ee62368d6)