Building the dictionary files with DictionaryGeneration
DictionaryGeneration is now a GUI application that is part of the DictionaryForMIDs-Creator. For a complete documentation on how to create dictionary files with DictionaryGeneration, go here
Note that you can still access to the older command line version of DictionaryGeneration
throught DictionaryForMIDs-Creator by coding the following on the command line (Linux Terminal
or Windows Command Prompt):
java -jar DfM-Creator.jar -DictionaryGeneration INPUT_DICTIONARY_FILE OUTPUT_DIRECTORY PROPERTY_DIRECTORY
Here is a sample dictionary file from the IDP:
PortugueseNoHeader.txt (38 kB).
This file is a 'Comma Separated Value' file (CSV-file), whereas instead
of a comma you can use any separation character. The separation
character is specified by the property
dictionaryGenerationSeparatorCharacter (see section
Configuring the properties of the file
DictionaryForMIDs.properties).
In the inputdictionaryfile for each language there is a column. Most
often you will have two languages (property numberOfAvailableLanguages
set to 2) and two columns.
If the dictionary that you want to set up is not yet in a CSV-format,
you need to convert it in such a format first using DictdToDictionaryForMIDs.
DictionaryGeneration generates searchfiles, indexfiles and dictionaryfiles. In addition to the generation of these files, DictionaryGeneration copies the file DictionaryForMIDs.properties to the output dictionary directory.
searchfileXXX
For each language one searchfile is generated. XXX is defined by the
property languageXFilePostfix.
A searchfile contains one entry per line in the following format:
-
keyword<searchListFileSeparationCharacter>indexfilenumber
The searchListFileSeparationCharacter-property is typically set to a
tab-character.
The keywords are the first keyword of the indexfile with the given
indexfilenumber. So for example the line
-
monument 18
indicates that the keyword monument is found at the beginning of
indexfile 18.
The entries in the searchfiles are sorted alphabetically according to
the keyword.
The keywords are normated.
indexfileXXXN
For each language several indexfiles are generated. XXX is defined by the property languageXFilePostfix; N is a sequence number.
An indexfile contains one entry per line in the following format:
-
keyword<indexFileSeparationCharacter>dictionaryfilenumber-charpos-searchindicator[,...]
The indexFileSeparationCharacter-property is typically set to a
tab-character.
For example the line
-
monument 29-383-B
indicates that the word monument together with its translation is
found in the dictionaryfile 29 at the byte position 383 and that
monument occurs at the beginning of the expression. The searchindicator is
either B for 'begin of expression' or S for 'substring in expression'.
For example for the expression "give up": "give up" will have a
searchindicator of B and "up" will have a searchindicator of S.
For one keyword there may be several references to different locations
in dictionaryfiles, each of these references is separated by comma.
The indexfiles contain one line per word to be translated. The entries
in the indexfiles are sorted alphabetically according to the keyword.
The keywords are normated.
directoryXXXN
For each language several dictionary files are generated. XXX is
defined by the property languageXFilePostfix; N is a sequence number.
Note: the number of dictionary files is not the same as the number of
indexfiles (typically there are more dictionary files).
A dictionary file contains one entry per line in the following format:
-
expression-from<dictionaryFileSeparationCharacter>expression-to
The dictionaryFileSeparationCharacter-property is typically set to a
tab-character.
For example the line
-
monument Denkmal (n)
translates the English "monument" to the German "Denkmal (n)".
The dictionary files are non-sorted (well, if the inputdictionary file is
sorted, then also the generated dictionary files are sorted; but there is
no need for the dictionary files to be sorted).
Prev | DfM-Creator Home | Next |
Content declarations | Creating DictionaryForMIDs.jar Manually |