Valid XHTML 1.0 Transitional

Building the dictionary files with DictionaryGeneration

DictionaryGeneration is now a GUI application that is part of the DictionaryForMIDs-Creator. For a complete documentation on how to create dictionary files with DictionaryGeneration, go here

Note that you can still access to the older command line version of DictionaryGeneration throught DictionaryForMIDs-Creator by coding the following on the command line (Linux Terminal or Windows Command Prompt):
java -jar DfM-Creator.jar -DictionaryGeneration INPUT_DICTIONARY_FILE OUTPUT_DIRECTORY PROPERTY_DIRECTORY

Here is a sample dictionary file from the IDP: PortugueseNoHeader.txt (38 kB).
This file is a 'Comma Separated Value' file (CSV-file), whereas instead of a comma you can use any separation character. The separation character is specified by the property dictionaryGenerationSeparatorCharacter (see section Configuring the properties of the file DictionaryForMIDs.properties).
In the inputdictionaryfile for each language there is a column. Most often you will have two languages (property numberOfAvailableLanguages set to 2) and two columns.
If the dictionary that you want to set up is not yet in a CSV-format, you need to convert it in such a format first using DictdToDictionaryForMIDs.

DictionaryGeneration generates searchfiles, indexfiles and dictionaryfiles. In addition to the generation of these files, DictionaryGeneration copies the file DictionaryForMIDs.properties to the output dictionary directory.

searchfileXXX

For each language one searchfile is generated. XXX is defined by the property languageXFilePostfix.
A searchfile contains one entry per line in the following format:

keyword<searchListFileSeparationCharacter>indexfilenumber

The searchListFileSeparationCharacter-property is typically set to a tab-character.
The keywords are the first keyword of the indexfile with the given indexfilenumber. So for example the line

monument 18

indicates that the keyword monument is found at the beginning of indexfile 18.
The entries in the searchfiles are sorted alphabetically according to the keyword.
The keywords are normated.
 

indexfileXXXN

For each language several indexfiles are generated. XXX is defined by the property languageXFilePostfix; N is a sequence number.

An indexfile contains one entry per line in the following format:

keyword<indexFileSeparationCharacter>dictionaryfilenumber-charpos-searchindicator[,...]

The indexFileSeparationCharacter-property is typically set to a tab-character.
For example the line

monument 29-383-B 

indicates that the word monument together with its translation is found in the dictionaryfile 29 at the byte position 383 and that monument occurs at the beginning of the expression. The searchindicator is either B for 'begin of expression' or S for 'substring in expression'. For example for the expression "give up": "give up" will have a searchindicator of B and "up" will have a searchindicator of S. 
For one keyword there may be several references to different locations in dictionaryfiles, each of these references is separated by comma.
The indexfiles contain one line per word to be translated. The entries in the indexfiles are sorted alphabetically according to the keyword.
The keywords are normated.
 

directoryXXXN

For each language several dictionary files are generated. XXX is defined by the property languageXFilePostfix; N is a sequence number.
Note: the number of dictionary files is not the same as the number of indexfiles (typically there are more dictionary files).

A dictionary file contains one entry per line in the following format:

expression-from<dictionaryFileSeparationCharacter>expression-to

The dictionaryFileSeparationCharacter-property is typically set to a tab-character.
For example the line

monument Denkmal (n)  

translates the English "monument" to the German "Denkmal (n)".
The dictionary files are non-sorted (well, if the inputdictionary file is sorted, then also the generated dictionary files are sorted; but there is no need for the dictionary files to be sorted).