Over there, on Crowdin, we now have a large number of people helping us translate the app. Many thanks for that!
However, as some have already noticed, the mere translation of the app’s user interface is not enough. The search still does not work in all languages. This is because the texts for the search are independent of the interface texts. The search texts are entered manually by us. It is considered which terms a user could possibly use to find exactly this specific function or symbol table. These terms do not necessarily have to appear somewhere in the actual function, but can be completely detached.
Examples:
- Morse: Samuel Morse, dashes, dots, points, short, long, dah, dit, dat
- Symbol table “Elvish”: The Lord of the Rings, J.R.R. Tolkien, The Hobbit, …
Therefore the search strings are stored in a separate file. Of course, the search texts also contain terms that can already be found in the normal translation files, such as the title of the function. So there are always a few duplications between the files.
Currently, the following search strings are used in the app:
- The search texts of the selected language, if already available
- The search text of the English version
- In addition, there is a file with search texts that are the same for all languages. So “J.R.R. Tolkien” from the example above can be found there, because this search term is presumably the same in every language and therefore you don’t have to name it repeatedly.
Content of Search Strings
Apart from the pure spellings, which will be looked at in more detail in a moment, it is of course important to think carefully about which search terms the user should be able to use to get to the function. One should always consider that the user is sitting in front of a Mystery puzzle and has only vague or no information at all. He is told a story about elves, but does not know that the language is called “Tengwar”. So the symbol table “Tengwar” should also be found with the terms “elves” or “elvish”.
- Of course, the correct title of the tool in the selected language must not be missing.
- What is its context? Is it a telegraph code? Then the search text “telegraphs” is a good choice.
- What do the symbols look like, do they have dashes? Then the search text could be “dashes lines”, maybe even “barcodes”. Or the text consists only of certain characters (“Kenny’s Code”: “mpf mmm fpf”). So you can derive the search texts by the appearance.
- One can also derive the terms from the historical or pop cultural context (“Enigma”: “worldwartwo worldwarii”, “Klingon”: “startrek klingons enterprise”).
Structure of Search Strings
The internal search function gets the input text. This text is then formatted:
- The text is made case insensitive. Upper and lower case is not important.
- Special letters are normalized: Ä to AE, ß to SS, É to E, Ç to C, …
After that, the system checks whether there is a search term that contains the entered text. For example, the search text “abcd” contains the text “BC”. A more realistic example would be the search text “reddwarf”. This search text covers the following inputs in any case:
- red
- dwarf
- reddwarf
- red dwarf
So when the user enters one of these terms, the related function in displayed. What would not work would be “Red Dwarves”. For this, a new search text must be added: “reddwarves”. In some languages, special letters may appear which will be converted before usage. The German word “Männer” (in English: “men”) contains the letter Ä. Because of the formatting mentioned above, the search text here would be “maenner”. Note the “ae” instead of “ä”. This spelling supports the search due to the formatting:
- männer
- maenner
So it is not necessary to add the search string “männer” to cover both variants.
So it makes sense to construct your search strings to cover many variations. These include in most cases:
- Usage of plural form
- For adjectives, a form as long as possible (“fastest” covers “fast” and “fastest”)
If the function needs are multiple words to be found, users may search in different ways. Let’s take the “J.R.R. Tolkien” example. Here, several different search texts are possible:
- With or without spaces: “JRR Tolkien” and “JRRTolkien”.
- With or without dots: “J.R.R. Tolkien”, “J. R. R. Tolkien” and “J.R.R.Tolkien”.
- Entire Name: “John Ronald Reuel Tolkien”.
The first both points can be done with the search string “jrrtolkien” because the tool ignores special characters and case-sensitivity. So the final search string would be “jrrtolkien johnronaldreueltolkien”. Also, one can think about different spellings (“grey” vs. “gray”, “color” vs. “colour”, “philipp” vs. “fillip”, “5” vs. “five”, …) or even frequently made spelling mistakes to support the user even better. Also possible would be variants with and without hyphen (“eastindian east-indian”) or apostrophe (“kenny’s kennys”). Consider add abbreviations and the long texts (“ioc” vs. “internationalolympiccommitee”), too.
At the end, all found terms are put into a string, separated by spaces: “all my found search strings”. Crowdin will often note spelling errors in the search strings, which is completely normal. You can completely ignore that.