PDIC Tips and PDICCONV:
EPWING to PDIC Conversion Script


About PDIC (Personal Dictionary)

PDIC is the original freely downloadable dictionary reader for Eijiro (accessible on the web here), a large dictionary compiled by Japanese translation researchers. It is independent from the writers of Eijiro however and has some advanced features that make it attractive as a general purpose dictionary reader. While the PDIC reader itself is limited to support on Windows and Windows CE, the PDIC format can be read by a variety of readers on other platforms.

PDIC feature highlights

Caveats

If you are ok with this I suggest you give it and eijiro a try ;)

EPWING to PDIC conversion - PDICCONV

Based on various methods on Japanese websites I have a created a general-purpose perl script that can be helpful in converting epwing format dictionaries to PDIC format. This one is a bit different form some other ones in that it takes an index file along with the contents to get more reliable and flexible results. Will split up entries which exceed the PDIC 32/unicode format maximum so no entries are dropped. Although written for and tested with specific dictionaries it shouldn't have much problems with other formats (please feel free to let me know if you want something supported).

Supported Dictionaries:

Download

PDICCONV v0.4 3/3/2008

Instructions:

Windows will be required for some operations (namely the DDWIN 2.30 epwing -> text conversion). Unzip file into a directory. Please see the readme file included in the package - pasted at the end of this page.

Where to find the Kenkyusha and similar dictionaries

Please see my Learning Japanese with Computers page for some tips on finding these - be sure to buy 電子ブック or epwing formatted ones.

More PDIC conversion links


Closing thoughts..

My standard dictionary group is made out of shinwaeidai, shineiwadai, eijiro (and waeijiro etc), edict, kanjidic, Kojien and Jim Breen's examples (listed in that order). I can't imagine that this combination is really beatable (don't know whether Kojien is still considered any good but...) It's much easier to view multiple dictionaries in PDIC than ebpocket etc, so for now I grit my teeth and make use out of the Windows and WinCE tax made on my devices. While I encourage the original PDIC author to look to open source for the future, other Linux versions exist also and I plan to have a deeper look at them later. Having this dictionary combination in a mobile phone (with a thumbboard) must be some definition of true confidence walking around Japan... and when using it on the desktop I feel like a bonefide Japanese language researcher. PDIC is my number one advanced Japanese learning tool. If I can't view a passage using a machine with PDIC on it, I already know how much its benefit will contrast with sticking to copy and pastable texts. Keep practicing your touch typing and copy/paste skills and you'll see what I mean.

I could go on about how computers have (and have a duty to continue to) changed the world. Separate 'consumer targetted' packages of silicon with crappy keyboards and no means to talk to other silicon lifeforms are not included in this same role. There is no way I could have learnt this much Japanese without computers, they comprise a fundamental change to the nature of research in daily and academic life. I would like to encourage the ebook industry to understand this and dump DRM in favour of the ability to copy text into translators. Until classrooms and educational institutions acknowledge this fully and make duly needed changes everywhere I will continue to write pages like these.

PDICCONV README

PDICCONV v0.4

Perl converter for text + index dictionary to PDIC 1-line text format
http://diamondsky.org/other/japanese/pdic.php
pdicconv.pl free for use under GPLv3

Use for Kenkyusha dictionaries, Hannes Löffler's dictionaries (http://www.hloeffler.info/epwing/) and easily adaptable to any epwing dictionary.

PERLの辞書内容+見出しテキスト形式からPDICの1列テキスト形式への変換スクリプトです
http://diamondsky.org/other/japanese/computerlearning.php
研究社、Hannes Löfflerの (http://www.hloeffler.info/epwing/)、多分いろいろなEPWING辞書に対応できます


Instructions:

There are 4 main steps for converting an epwing dictionary to PDIC(the best dictionary viewer I know of, homepage3.nifty.com/TaN/).
Skip any that you do not need to do.

1) Use DDWIN 2.30 (http://homepage2.nifty.com/ddwin/) to output raw text dictionary content file and index file

2) Use gaijirep.pl and gaijirep-idx.pl to filter out the GAIJI (extra character) codes from the text files

3) Use pdicconv.pl to convert content and index file into PDIC 1-line text file

4) Use PDIC to convert 1-line text file into PDIC .dic dictionary file


0. INSTALLATION

Unzip pdicconv.zip contents into a directory.


1. DDWIN

Download and install DDWIN 2.30 (http://homepage2.nifty.com/ddwin/) and copy your epwing CATALOGS file and dictionary directory to the root of some drive (eg. C:\). You can delete them after this step is finished.

2.30 works better with dumping epwing files than the newer versions, with some exceptions. For the Kenkyusha Shineiwa Dai Jiten, you will have to use a later version like 2.66 to process the index (see below).

Run DDWIN, go to ファイル menu, 辞書をサーチする
Press OK or use the option to specify the drive you copied your epwing dictionary
Click on the 全文 tab, close whatever dialog comes up, and with nothing in the search field press enter. This will lookup the contents of the entire dictionary (takes some time).
Go to 編集 menu, エディター起動
- First, choose 該当項目すべて select a content filename (we use dic-gaiji.txt) and wait for the contents to be dumped to a text file. This will take time (use ctrl-alt-del to kill notepad when it automatically opens after it)
- Second, go back to this menu option again and select 該当項目の見出し, select an index filename (we use dic-idx-gaiji.txt) and again kill notepad when it opens

You can use a better text viewer like notepad++ to check the contents of the file and guess whether it has worked correctly. Note that for shineiwadai the index dumping step has to be used with a later version than 2.30 (such as 2.66), which will need to be filtered below in a slightly different way. Stick to 2.30 for the contents.


2. Filter gaiji character codes

Copy the two text files into the pdicconv directory and make sure they are named dic-gaiji.txt and dic-idx-gaiji.txt for the content and index files respectively. Skip this step if your dictionary has no gaiji in it.

- Run the appropriate gaijirep-.bat file (eg. gaijirep-kenkyusha.bat) to filter dic-gaiji.txt.
- Then run the gaijirep-idx.bat file (eg. gaijirep-idx-kenkyusha.bat) to filter the dic-idx-gaiji.txt file

These steps will take some time to run and produce clean output text files called dic.txt and dic-idx.txt. Note that if you used a version of DDWIN later than 2.30 for eg. the index file in the case of shineiwadai, please use the gaijirep-idx-shineiwadai260.bat file to perform the gaiji filtering for the index file only.


3. Convert to 1-line text format

Run the appropriate pdicconv-.bat file (eg. pdicconv-shinwaeidai.bat) to output a 1-line text format "pdic.txt" from the filtered files dic.txt and dic-idx.txt.


4. Convert to PDIC dictionary .dic file

Run PDIC, goto Tools menu, select 辞書の変換
- In the first field, select 1列テキスト形式, and select the pdic.txt file you made
- In the second field, leave it as PDIC形式 and choose a filename you want for your PDIC .dic file
All the other default options should be ok.
Click ok, untick the box in the next dialog if you want to skip any compression pdic tries to apply to the file, click ok etc and it should be done.

You can now add your new .dic file to your dictionary group. You can delete all the .txt files created in the process when you are finished, and as well as the pdicconv package.

Yes that's the end of the instructions.


5. Notes on other dictionaries and customisation

If your dictionary is not listed try something which sounds close
i.e.     shinwaeidai for Japanese-English dictionaries
         shineiwadai for English-Japanese
Or just use the generic pdicconv.bat

Please feel free to make any requests to me it is really very easy to include other formats.


Epwing to Text
----------------

The index creation from DDWIN is not perfect for pdic use. There are gaiji/sublookup/mismatch problems.
An alternative way of index creation is to try and open an epwing catalogs file directly in PDIC (note that I've never been able to get it to generate anything other than an index of word lookups), convert it to text format and edit from there. Although this way there is a high chance of the lookups not being in the original order, which it should for this program to work reliably


Gaiji
-----

The gaiji replacement data are listed for each dictionary type respectively in the gaiji-.txt files. Note that they are not complete or perfect. They are merged with alphakan.txt (convert full-width latin to half-width) to create a gaiji.txt file during the gaijirep-.bat process.

You can call gaijirep.pl and gaijirep-idx.pl etc directly to take input from gaiji.txt. All the perl scripts must be called with jperl (included in the package or downloadable at http://homepage2.nifty.com/kipp/perl/jperl/). You can add/edit whatever gaiji replacements you like. Use the generic gaijirep.bat files to process gaiji.txt directly. A few gaiji replacement tables found in various places on the net are included in the package for your reference.

You can delete all gaiji codes without going through the gaijirep process with an option in pdicconv.pl shown below.


pdicconv.pl
-----------

You can call pdicconv.pl directly to merge dic.txt and dic-idx.txt into an output file.

Usage:
jperl pdicconv.pl -t <dic type> -gaijidel (optional) -o <output file> (optional)

<dic type>
   shinwaeidai-kana        (j-e kenkyusha kana index)
   shinwaeidai-kanji        (j-e kenkyusha kanji index)
   shineiwadai        (e-j kenkyusha)               
   examples        (edict examples dic)
   kanjidic-fpw        (edict kanjidic freepwing
   nofilter            (attempt generic conversion)

-gaijidel
Will delete all gaiji [xxxxx] tags

<output file>
Name of output file you want

There are a few flags with comments at the start of this file which you can edit also.
For Japanese-English dictionaries you will need to run this twice, once for a kana-lookup output and once for a kanji-lookup. You can then merge the two output files together for conversion in PDIC (see the pdicconv-shinwaeidai.bat file for an example).

There is a maximum entry constraint in classic PDIC versions of ~12KB for unicode and perhaps less for PDICW32. PDICCONV Will split up entries which exceed ~8KB by appending #2, #3 etc to the end of word look-ups so that no entries are dropped.

All the string replacement done is very simple and can be customised by inspection of the code.

Please search the web for documentaion on the pdic 1-line text file format, it is very simple. Lots of conversion references here: http://kazuo.fc2web.com/dic/ddwin2.htm

----

Thanks to everyone who has shared information on the format on the web. Part of the gaiji conversion code was adapted from Kissui Shimotsuki's kojien converter (http://www.ikushimo.com/).

Thanks very much for trying out my software. Feel free to ask any questions you like or leave a comment on my Japanese computing site.
日本語でももちろんOKだからですね^^;
http://diamondsky.org/other/japanese

-islisis
( evaunit1 (at) diamondsky.org )

Comments

ewfewf: sugoidesu!
(05.03.2008, 00:54)

Dudley Moon: Hi,

Question: Can your EPWING to EDICT programs convert Peter Rivard's EPWING version of Eijiro with yomigana to EDICT format?

Background:
I come to Japan each year for my college and I am trying to 1) learn more Japanese and 2) convert my Palm Tx to a really useful denshi jisho. (I have a WordTank 80 but it is still beyond my level to use efficiently).

I have been getting help from Peter Rivard (he has been a terrific helf) and he has a new EPWING version of Eijiro with yomigana added. I have downloaded EDICT and am very impressed with it. On the other hand, the EPWING reader for Palm that Peter listed (Buckingham Reader) does not seem to wotk well for me (i.e., I can not get it use newly downloaded dictionaries). Which leads me to a second question:

Question 2: Is there a better EPWING reader available for Palm OS version 5.4.9 ?

My system is Handheld: Palm Tx, Palm OS version 5.4.9 and I have CJKOS loaded as a Japanese language program. My laptop is a Tablet PC using Windows XP Pro with Japanese language enabled.

Thank you in advance for any and all help you can provide.
(28.07.2008, 11:40)

islisis: Q1:
Sure, (assuming you mean pdic not edict) I don't see why not. It looks like it's in the same format as the J-E New Collegiate. However, as he is deciding to charge more than even the major Japanese publishers do and for a dictionary he did not even write, I can't attempt to do so directly myself. Go through the readme - you can skip the gaiji replacement step as i don't think there are any special characters in eijiro. Just create the dic.txt and dic-idx.txt files directly from the ddwin output. Then try running pdicconv-shinwaeidai.bat to format to 1-line text, and finally the converter in pdic's tools menu.
If it doesn't work, mail me whatever details you can of what went wrong. Good luck!

Q2:
Sorry I'm not actually familiar at all with Palm software!

On a side note, his dictionary looks like a great idea, but it is not his work, being most likely machine translated through freely available software, and something so basic should be made available through the Eijiro homepage for global maintenance. I use pdic's shortcut keys to switch between kana and kanji entries (see tips above) to avoid doubling the size of the dictionary file, but if people wanted it I might create some tools to convert Eijiro to a yomigana-style dictionary for free.
(28.07.2008, 12:55)

JimR: This may be a strange question, but would you have any idea on how to make the OPPOSITE conversion? I have some PDICT files that I want to convert to Epwing; I have a Zaurus PDA and there don't seem to be any PDICT viewers for the Zaurus anymore.

Unless you know of one?
Thanks!
(21.08.2008, 22:29)

islisis: Sorry, how does a viewer disappear from platform? There are always mirror sites, or did you replace the os... at any rate, you could always compile from source! zpdview was a famous one, wasn't it?
http://zaurus.biojapan.de/zpdview.html

As for your original question, it's not strange at all, it's easy! There are tons more solutions for converting pdic to epwing, and much easier. ebstudio for instance
http://www31.ocn.ne.jp/~h_ishida/EBStudio.html
plenty of other ones in the last pdic conversion link given above
(yes they are in japanese, isn't that why you are here? :P i'm sure if you search you can find english guides...)
(21.08.2008, 23:07)

Comments temporarily disabled.