Valid XHTML 1.0 Transitional

Setting up a new dictionary for DictionaryForMIDs


See important change notes from past releases here!


Setting up a dictionary is just configuration, there is no need to have programming knowledge or a development environment. And if you have any problem during setting up a dictionary for DictionaryForMIDs, just contact us and we will assist you.

Setting up a dictionary for DictionaryForMIDs involves the following 3 steps:

  1. Configuring the properties of the file DictionaryForMIDs.properties
  2. Generating the files for DictionaryForMIDs
  3. Putting the generated files in DictionaryForMIDs.jar

 

1. Configuring the properties of the file DictionaryForMIDs.properties

DictionaryForMIDs is customized via properties in DictionaryForMIDs.properties. Each of the properties must be provided unless noted otherwise.

Here is the list of properties:

Text that is shown on the top of the info-dialog. Please provide here information about the dictionary. Obligatory, include here contact information for someone who can be contacted concerning the dictionary. That may be you (the person who set up this dictionary into DictionaryForMIDs) and/or the maintainer of the dictionary itself. Please include an email-address and/or homepage.
Also obligatory, please put a copyright notice for the dictionary.
 

Short abbreviation for identifying the origin of the dictionary. This is an abbreviation for the name of the organization or project where the dictionary comes from, e.g. freedict for the dictionaries from freedict.org. Preferably only a few characters long. The JarCreator tool uses this property to form the application name.
 

Defines how many languages are in the dictionary. For many dictionaries this will be 2. For each language the languageX-properties need to be defined as described below (X is a number starting from 1 to numberOfAvailableLanguages).
 

Text that is used on the user interface to identify the language. X needs to be replaced with the number of the column for the language. For example:
language1DisplayText: English
language2DisplayText: Portuguese
 

Text that is used in file names to identify searchfile and index files for a language. Typically a 3-letter text, such as Eng for English; as defined in the ISO 3-letter codes at http://etext.lib.virginia.edu/tei/iso639.html.
 

A boolean property with either the value true or the value false. Set to true when it is allowed to search for translations for that language. Normally this property is set to true for bi-directional translation dictionaries. For lookup dictionaries, e.g. for an acronym dictionary, where it is only possibly to search from the acronym to the explanation, this value is set false for the explanation language/column. For an example see the elements dictionary from the download section.
Also for an unidirectional dictionary, which for example only translates English to Portuguese (but not Portuguese to English), you have to set languageXIsSearchable to false for Portuguese.
This property is optional, the default value is true.
 

A boolean property with either the value true or the value false. Tells DictionayGeneration whether to generate an index for this language.
This property is optional, the default value is true.
Normally this property has the same value as languageXIsSearchable.
 

A boolean property with either the value true or the value false. Set to true, when there is a separate dictionaryXXX.csv file for this language (for an explanation about the files, see section Files generated by the DictionaryGeneration tool). Normally all languages use the same dictionaryXXX.csv files, namely for those dictionaries where expression ABC translates to XYZ and this means that XYZ translates back to ABC. For dictionaries where ABC translates to XYZ, however XYZ translates to DEF, this property is set to true. For an example, see the German-French freedict dictionary from the download section.
For documentation, see here.
This property is optional, the default value is false.
 

Separation character for the input dictionary file that is read by DictionayGeneration. This character needs to be put in apostrophes, e.g. ','. Can also be '\t' (backslash plus t) for a tab-character.
This property is optional, the default value is \t (tab-character)
 

Separation character for the output csv files that are generated by DictionayGeneration. This character needs to be put in apostrophes, e.g. ','. Can also be '\t' (backslash plus t) for a tab-character. Typically these properties are set to the same value as dictionaryGenerationSeparatorCharacter.
 

Used by DictionaryGeneration: when for a language a read expression actually contains several expressions, then this property is set to the string that separates the expressions.
Example: the expression "to choose, to select, to pick" contains not one but three expressions: (1) "to choose", (2) "to select" and (3) "to pick". By setting dictionaryGenerationLanguageXExpressionSplitString to ,  for this language, DictionaryGeneration will extract these 3 expressions. This is done for language2 with the following line:
dictionaryGenerationLanguage2ExpressionSplitString=,
This property is optional, the default value is 'property not set'.
 

Character set encoding for the input dictionary file that is read by DictionaryGeneration.

Supported character set encodings are:
UTF-8
ISO-8859-1
US-ASCII

This property is optional, the default value is ISO-8859-1.
 

These 3 properties define the character set encoding that is used for the output searchlist file/index files/dictionary files.

Supported character set encodings are:
UTF-8
ISO-8859-1
US-ASCII

Note: on very old mobiles/PDA devices UTF-8 may not yet be supported. We would expect that each model that was released recently supports UTF-8.
 

This property defines for the DictionaryGeneration-tool a "DictionaryUpdate"-class that is used for a language. DictionaryUpdateClass changes the ways entries are stored when the tool converts an input dictionary file into the DictionaryforMIDs generated files. For details on the files created when a dictionary is generated, see Generating the files for DictionaryForMIDs.

For example: DictionaryUpdateEngDef removes unneeded words from the indexes such as "the", "a", and "at". These words are unneeded in the indexes and adds unnecessarily to the file size. When a user performs a search then these words will still be displayed in the definition, however.

The property languageXDictionaryUpdateClassName is optional. Use this property only if you really need it, otherwise remove any languageXDictionaryUpdateClassName-lines in the property file !
 

This is the name of a Java class that is used to 'normate' words. Whereas DictionaryUpdateClass changes dictionary files only when the dictionary is generated, NormationClass affects the words that the user enters when searching.

For example: NormationGer parses the nonNormatedWord for the German 'Umlauts' (ä, ö, ü) and returns the word with the Umlaut-paraphrasing (ae, oe, ue). So the user can search for "Mädchen" or "Maedchen" and the translation will be found in both cases.

These changes in the dictionary files are done in 2 steps.  First the DictionaryGeneration-tool calls the NormationClass to change the indexes to incorporate the phonetic changes (ä is changed to ae, for example).  Then when the user searches, the NormationClass is called again to make the phonetic changes to match the changes that were made earlier with the DictionaryGeneration-tool.

Via Normation-classes it is possible to provide language-specific search features and phonetic search. A lot of power lies in these Normation classes !!
For documentation of NormationClass, see here.
The property languageXNormationClassName is optional.
 

With the languageXContentNN properties you can specify the content of your dictionary. For example you can specify that there is a pronunciation part, an explanation part, etc.
For more information on the languageXContentNN-properties, see here.
The languageXContentNN properties are optional.
 

Defines the size in bytes of the biggest searchlist file/index file/dictionary file, as generated by DictionaryGeneration. These properties are automatically determined and set by DictionaryGeneration, normally there is no longer a need to set these properties manually.
However these properties need to be manually defined when a dictionary is merged from two (or more) different source dictionaries and DictionaryGeneration is run once for each of these source dictionary (the values from the first run would be overwritten by the second run).  When these properties are already manually set when DictionaryGeneration is run, then no automatic generation for these properties is done. For the manual values you must ensure that no searchlist file/index file/dictionary file is bigger than the property value, otherwise some translations are not found. There is no problem if the value of these properties is bigger than the actual maximum file size. For example if you set the dictionaryFileMaxSize to 50000 even if the biggest dictionary file is only 35000 bytes everything will work correctly. However DictionaryForMIDs will allocate 50000 bytes of heap memory, and keep in mind that specifically for older devices heap memory is scarce.
 

Defines for DictionaryGeneration the number of entries (= lines) per dictionary file and per index file.
These properties are optional, the default value for dictionaryGenerationMinNumberOfEntriesPerDictionaryFile is 200 and the default value for dictionaryGenerationMinNumberOfEntriesPerIndexFile is 500.
As a general hint, you could try to set these values so that the size of a single directory file and the size of a single index file do not exceed 100 kB (size defined by properties searchListFileMaxSize/indexFileMaxSize/dictionaryFileMaxSize).
If you set up a small dictionary that should support very old devices with very little heap memory, then set these values low enough, for example so that the biggest file does not exceed 10 kB.
 

This property is automatically generated by DictionaryGeneration. The value contains the number of 'begin of expression'-index entries. This value gives the number of words/expressions that the dictionary contains for languageX. The values of languageXIndexNumberOfSourceEntries are shown in the Info-Dialog (will be implemented in a future version).
Note that when you merge a dictionary from two (or more) source dictionaries and DictionaryGeneration is run once for each of these source dictionary, then you need to manually copy the entries for languageXIndexNumberOfSourceEntries into the final DictionaryForMIDs.properties file.
 

Note: the logLevel is set in the file DictionaryForMIDs.jad (not DictionaryForMIDs.properties).
Allows to switch on some debugging output. A logLevel of 0 switches debugging output off, a logLevel of 3 switches all debugging output on, and the levels 1 and 2 switch some debugging output on. A higher logLevel means more debugging output.
 

Here is a sample DictionaryForMIDs.properties file:

		infoText: English-Portuguese dictionary from IDP: http://www.june29.com/IDP
		dictionaryAbbreviation: IDP
		numberOfAvailableLanguages: 2
		language1DisplayText: English
		language2DisplayText: Portuguese
		language1FilePostfix: Eng
		language2FilePostfix: Por
		dictionaryGenerationSeparatorCharacter: '\t'
		indexFileSeparationCharacter: '\t'
		searchListFileSeparationCharacter: '\t'
		dictionaryFileSeparationCharacter: '\t'
		dictionaryGenerationInputCharEncoding: ISO-8859-1
		indexCharEncoding: ISO-8859-1
		searchListCharEncoding: ISO-8859-1
		dictionaryCharEncoding: ISO-8859-1
		language1DictionaryUpdateClassName: de.kugihan.dictionaryformids.dictgen.dictionaryupdate.DictionaryUpdateIDP
		language2DictionaryUpdateClassName: de.kugihan.dictionaryformids.dictgen.dictionaryupdate.DictionaryUpdateIDPSpa
		language1NormationClassName: de.kugihan.dictionaryformids.translation.normation.NormationEng
		language2NormationClassName: de.kugihan.dictionaryformids.translation.normation.NormationLat
		

2. Generating the files for DictionaryForMIDs

Downloading

Download the latest version of the DictionaryGeneration tool:
DictionaryForMIDs_DictionaryGeneration_3.1.0.zip (373 kB) (this is the right version for DictionaryForMIDs >= 2.5.0)

DictionaryGeneration requires a J2SE runtime on your PC. If you are not sure whether you have the J2SE runtime installed, see if from the command prompt the command "java" exits. If you do not have installed a J2SE runtime, you can download it from http://java.com/en/download/download_the_latest.jsp (this is > 10 MB).
 

Using the DictionaryGeneration tool

DictionaryGeneration is a command line tool. You start the DictionaryGeneration tool as following:

java -jar DictionaryGeneration.jar inputdictionaryfile outputdirectory propertydirectory
		inputdictionaryfile: 	file from which the directory is read
		outputdirectory: 		pathname where the generated directory files are written to (must end with "dictionary" !)
		propertydirectory: 	directory where the file DictionaryForMIDs.properties is located

		
inputdictionaryfile:

The first parameter is the dictionary file that you want to set up on DictionaryForMIDs. Here is a sample dictionary file from the IDP: PortugueseNoHeader.txt (38 kB).
This file is a 'Comma Separated Value' file (CSV-file), whereas instead of a comma you can use any separation character. The separation character is specified by the property dictionaryGenerationSeparatorCharacter (see section Configuring the properties of the file DictionaryForMIDs.properties).
In the inputdictionaryfile for each language there is a column. Most often you will have two languages (property numberOfAvailableLanguages set to 2) and two columns.
If the dictionary that you want to set up is not yet in a CSV-format, you need to convert it in such a format first.

outputdirectory:

The second parameter specifies the directory path to which the generated files a written. This directory path must end in "dictionary" ! DictionaryGeneration writes to this directory the following files: searchlistxxx.csv, indexxxn.csv, dictionaryn (xxx is a placeholder for the value specified by the property languageXFilePostfix and n is a sequence number).

propertydirectory:

The third parameter is a directory path where the configurated file DictionaryForMIDs.properties is found.
 

Here is an example for starting DictionaryGeneration:

java -jar DictionaryGeneration.jar dictionaries\IDP\Por\PortugueseNoHeader.txt output\dictionary dictionaries\IDP\Por 


Customization of DictionaryGeneration with DictionaryUpdate classes

The DictionaryGeneration tool can be customized by DictionaryUpdate classes. Read here for a description of DictionaryUpdate classes.
 

Files generated by the DictionaryGeneration tool

DictionaryGeneration generates searchfiles, indexfiles and dictionaryfiles. In addition to the generation of these files, DictionaryGeneration copies the file DictionaryForMIDs.properties to the outputdirectory.

searchfileXXX

For each language one searchfile is generated. XXX is defined by the property languageXFilePostfix.
A searchfile contains one entry per line in the following format:

keyword<searchListFileSeparationCharacter>indexfilenumber

The searchListFileSeparationCharacter-property is typically set to a tab-character.
The keywords are the first keyword of the indexfile with the given indexfilenumber. So for example the line

monument 18

indicates that the keyword monument is found at the beginning of indexfile 18.
The entries in the searchfiles are sorted alphabetically according to the keyword.
The keywords are normated.
 

indexfileXXXN

For each language several indexfiles are generated. XXX is defined by the property languageXFilePostfix; N is a sequence number.

An indexfile contains one entry per line in the following format:

keyword<indexFileSeparationCharacter>dictionaryfilenumber-charpos-searchindicator[,...]

The indexFileSeparationCharacter-property is typically set to a tab-character.
For example the line

monument 29-383-B 

indicates that the word monument together with its translation is found in the dictionaryfile 29 at the byte position 383 and that monument occurs at the begin of the expression. The searchindicator is either B for 'begin of expression' or S for 'substring in expression'. For example for the expression "give up": "give up" will have a searchindicator of B and "up" will have a searchindicator of S. 
For one keyword there may be several references to different locations in dictionaryfiles, each of these references is separated by comma.
The indexfiles contain one line per word to be translated. The entries in the indexfiles are sorted alphabetically according to the keyword.
The keywords are normated.
 

directoryXXXN

For each language several dictionaryfiles are generated. XXX is defined by the property languageXFilePostfix; N is a sequence number.
Note: the number of dictionaryfiles is not the same as the number of indexfiles (typically there are more dictionaryfiles).

An dictionaryfile contains one entry per line in the following format:

expression-from<dictionaryFileSeparationCharacter>expression-to

The dictionaryFileSeparationCharacter-property is typically set to a tab-character.
For example the line

monument Denkmal (n)  

translates the English "monument" to the German "Denkmal (n)".
The dictionaryfiles are non-sorted (well, if the inputdictionaryfile is sorted, then also the generated dictionaryfiles are sorted; but there is no need for the dictionaryfiles to be sorted).
 

Creating a bitmap font

If you wish to include a bitmap font with the dictionary, please see the instructions for help creating one. The bitmap font generator will create the file 'font.bmf' which contains all of the font data.

 

3. Putting the generated files in DictionaryForMIDs.jar

Download the empty DictionaryForMIDs ('empty' means that there is no dictionary included):
 DictionaryForMIDs_3.2.0_empty.zip (227 kB).
You need to extract the files DictionaryForMIDs.jar and DictionaryForMIDs.jad.

Next you need to include the dictionary-files that were generated with DictionaryGeneration in the JAR-file. This can be done conveniently with the JarCreator tool: DictionaryForMIDs_JarCreator_3.1.2.zip (352 kB)  (this is the right version for DictionaryForMIDs >= 3.1.2)
 

Updating DictionaryForMIDs.jar with JarCreator

Here is how to use JarCreator:

java -jar JarCreator.jar dictionarydirectory emptyjar outputdirectory
dictionarydirectory:

The first parameter is the directory where the generated files are located. This is the same path as outputdirectory for DictionaryGeneration. Also this directory path must end in "dictionary". If you are also using a bitmap font with this dictionary, font.bmf must be located in this directory. JarCreator will automatically move the font file from this directory to its correct location in the JAR package.

emptyjar:

The second parameter is the directory where the 'empty' files DictionaryForMIDs.jar and DictionaryForMIDs.jad are found.

outputdirectory:

The third parameter is the directory where JarCreator stores the completed DictionaryForMIDs_xxx.jar and DictionaryForMIDs_xxx.jad (xxx is filled in by JarCreator with languageFileXPostfix and dictionaryAbbreviation). This output contains the dictionary files from the dictionarydirectory.
 

Updating DictionaryForMIDs.jar manually (if you are not using JarCreator)

(if you are using JarCreator, continue with Sample DictionaryForMIDs.jar)
Alternatively to using JarCreator, you can also update the file DictionaryForMIDs manually: You use a ZIP-utility to do so, such as the free info-zip (Windows version or command line version) or WinZip.

In the file DictionaryForMIDs.jar add the directory "dictionary" including all the generated files. Important: in the JAR-file the generated files _must_ be in the directory dictionary, otherwise you will receive an error message when you translate from DictionaryForMIDs. Depending on the ZIP-utility that you are using, adding the files in the directory dictionary can be little bit tricky.

If you are including a bitmap font with the dictionary, font.bmf must be moved out of the 'dictionary' directory, and into a new directory called 'fonts'. Bitmap font support will not be available in DictionaryForMIDs unless the font file is in the correct directory. The 'fonts' folder should be at the same level as the 'dictionary' directory in the directory tree.

After adding the dictionary files, you need to adjust the property MIDlet-Jar-Size in the file DictionaryForMIDs.jad. Put precisely the file size of the file DictionaryForMIDs.jar behind this property, such as here

MIDlet-Jar-Size: 58037
		

Sample DictionaryForMIDs.jar

Here is an example for DictionaryForMIDs set up with a dictionary. You can see there that the generated files are in the directory "dictionary":
DictionaryForMIDs_2.4.0_EngPor_IDP_dev.zip (66 kB).
 

Packaging into a ZIP.file

For packaging put the 4 files (1) DictionaryForMIDs_xxx.jar (2) DictionaryForMIDs_xxx.jad (3) README and (4) COPYING into a ZIP file. You should use this file naming convention:

DictionaryForMIDs_VVVVV_XXXYYY_ZZZ.zip
VVVVV: version of DictionaryForMIDs, for example "3.0.0"
XXX: language1FilePostfix, for example "Eng"
YYY: language2FilePostfix, for example "Por"
ZZZ: info on the origin of the dictionary (can be longer than 3 characters), for example "IDP" or "freedict"; sould be the same as defined in the property dictionaryAbbreviation.
 

If you have any problem with setting up a new dictionary, just contact us and we will try to help you !