PDIC
Tips and PDICCONV:
EPWING to PDIC
Conversion Script
About PDIC
(Personal
Dictionary)
PDIC is the original freely downloadable dictionary reader for Eijiro
(accessible on the web here),
a large dictionary compiled by Japanese translation researchers. It is
independent from the writers of Eijiro however and has some advanced
features that make it attractive as a general purpose dictionary
reader. While the PDIC reader itself is limited to support on Windows
and Windows CE, the PDIC format can be read by a variety of readers on
other platforms.
PDIC feature highlights
- Combine (up to 6 for the WinCE version) dictionary
definitions into a single lookup. As far as I know ebpocket
and
ebview will always force you to view each dictionary lookup separately,
PDIC allows you to just keep scrolling down in the content window to
see the next dictionary definition. NOTE: to view long definitions
either click on the scrollbar arrows or hold down shift and press up or
down
- Automatic clipboard lookup - with the default statusbar
view,
click the 自動検索 button to toggle this feature. Any word from an
application pasted into the clipboard with a dictionary entry will be
displayed in a popup window, very handy. Can be assigned to a global
shortcut key (first setting in the options menu -> ショートカットキーの設定
window)
- History and text file viewer - pressing 'esc' will clear
the word
lookup and store it into a history file, accessible for a single
session in the look-up bar (shortcut: alt-down) and an all-sessions
viewer window via the tools menu
(second-last command). Good for adding words to later study. When
viewing the history viewer, PDIC will automatically display for a
pop-up
for highlighted words, or for the word coming after a whitespace by
pressing 'ctrl-p'. A text window viewer can also be called via the
tools menu (third command) with the same function.
- Workable jump feature - hold down 'ctrl' and any word surrounded
by whitespace will be underlined - click once to look it up in a pop-up
window, double-click to jump to its entry. Alt-left will go back to the
previous word. For words which don't get underlined you can highlight
them and select 単語検索 to jump to its definition - note however that this
method unfortunately won't work with the alt-left back function :/
copy-paste-undo works well enough however
- Useful shortcut keys, including "ctrl-j" for copying the
definition - paste this back into the lookup bar to quickly jump
between kanji/kana for J-E dictionaries
- Clean layout with hidable toolbars, selectable font and
background settings
- Will work on any Windows/WinCE
machine which supports Unicode, which means you don't have to make any
special setting changes to your English pc/handheld device
- Fast! Even on old WinCE devices like the Jornada 720,
looking up some 1 million
indexes
Caveats
- 'shareware' - NOT open source; unlike many epwing
readers
- PDIC format not natively supported by many other
dictionaries;
unlike epwing format
- Non-unicode version requires OS Japanese
language locale
- Conversion can be a pain (which is why this page is here...)
If you are ok with this I suggest you give it and eijiro a try ;)
EPWING to PDIC conversion - PDICCONV
Based on various methods on Japanese websites I have a created a
general-purpose perl script that can be helpful in converting epwing format
dictionaries to PDIC format. This one is a bit different form some
other ones in that it takes an index file along with the contents to
get more reliable and flexible results. Will split up entries which
exceed the PDIC 32/unicode format maximum so no entries are dropped.
Although written for and tested
with specific dictionaries it shouldn't have much problems with other
formats (please feel free to let me know if you want something
supported).
Supported Dictionaries:
- Kenkyusha
- Shinwaei Dai Jiten (新和英大辞典, released in English as "New
College" or "New Collegiate", the big one..)
- Shineiwa Dai Jiten (新英和大辞典, released 2006 and much better
than
Readers)
- Hannes
Löffler's EDICT
conversions
Download
PDICCONV v0.4 3/3/2008
Instructions:
Windows will be required for some operations (namely the DDWIN 2.30
epwing ->
text conversion). Unzip file into a directory. Please see the readme file included in the package
- pasted at the
end of this page.
Where to find the Kenkyusha and similar dictionaries
Please see my Learning
Japanese with Computers page for some tips on finding these -
be
sure to buy 電子ブック or epwing formatted ones.
More PDIC conversion links
- DICO
is
another perl script to convert native edict formats - edict, enamdict,
and another for Kojien - to PDIC format.
- Peter
Rivard has also pre-converted edict dictionaries to PDIC
format
directly downloadable on this page.
(WDIC
is a PDIC reader for Palm)
- Most other PDIC conversion files have been created by
Japanese
authors, many
found
here. Includes scripts for Kojien and Readers, which may or
may not
work properly - let me know if you have trouble.
Closing thoughts..
My standard dictionary group is made out of shinwaeidai, shineiwadai,
eijiro (and waeijiro etc), edict, kanjidic, Kojien and Jim Breen's
examples (listed in that order). I can't imagine that this combination
is really beatable (don't know whether Kojien is still considered any
good but...) It's much easier to view multiple dictionaries in PDIC
than ebpocket etc, so for now I grit my teeth and make use out of the
Windows and WinCE tax made on my devices. While I encourage the
original PDIC author to look to open source for the future, other Linux
versions exist also and I plan to have a deeper look at them later.
Having this dictionary combination in a mobile phone (with a
thumbboard) must be some definition of true confidence walking around
Japan... and when using it on the desktop I feel like a bonefide
Japanese language researcher. PDIC is my number one advanced
Japanese learning tool. If I can't view a passage using a machine with
PDIC on it, I already know how much its benefit will contrast with
sticking to copy and pastable texts. Keep practicing your touch typing
and copy/paste skills and you'll see what I mean.
I could go on about how computers have (and have a duty to continue to)
changed the world. Separate 'consumer targetted' packages of silicon
with crappy keyboards and no means to talk to other silicon lifeforms
are not included in
this same
role. There is no way I could have learnt this much Japanese without
computers, they comprise a fundamental change to the nature of research
in daily and academic life. I would like to encourage the ebook
industry to understand this and dump DRM in favour of the ability to
copy text into translators. Until classrooms and educational
institutions acknowledge this fully and make duly needed changes
everywhere I will continue to write pages like these.
PDICCONV README
PDICCONV v0.4
Perl converter for text + index dictionary to PDIC 1-line text format
http://diamondsky.org/other/japanese/pdic.php
pdicconv.pl free for use under GPLv3
Use for Kenkyusha dictionaries, Hannes Löffler's dictionaries
(http://www.hloeffler.info/epwing/) and easily adaptable to any epwing
dictionary.
PERLの辞書内容+見出しテキスト形式からPDICの1列テキスト形式への変換スクリプトです
http://diamondsky.org/other/japanese/computerlearning.php
研究社、Hannes Löfflerの
(http://www.hloeffler.info/epwing/)、多分いろいろなEPWING辞書に対応できます
Instructions:
There are 4 main steps for converting an epwing dictionary to PDIC(the
best dictionary viewer I know of, homepage3.nifty.com/TaN/).
Skip any that you do not need to do.
1) Use DDWIN 2.30 (http://homepage2.nifty.com/ddwin/) to output raw
text dictionary content file and index file
2) Use gaijirep.pl and gaijirep-idx.pl to filter out the GAIJI (extra
character) codes from the text files
3) Use pdicconv.pl to convert content and index file into PDIC 1-line
text file
4) Use PDIC to convert 1-line text file into PDIC .dic dictionary file
0. INSTALLATION
Unzip pdicconv.zip contents into a directory.
1. DDWIN
Download and install DDWIN 2.30 (http://homepage2.nifty.com/ddwin/) and
copy your epwing CATALOGS file and dictionary directory to the root of
some drive (eg. C:\). You can delete them after this step is finished.
2.30 works better with dumping epwing files than the newer versions,
with some exceptions. For the Kenkyusha Shineiwa Dai Jiten, you will
have to use a later version like 2.66 to process the index (see below).
Run DDWIN, go to ファイル menu, 辞書をサーチする
Press OK or use the option to specify the drive you copied your epwing
dictionary
Click on the 全文 tab, close whatever dialog comes up, and with nothing
in the search field press enter. This will lookup the contents of the
entire dictionary (takes some time).
Go to 編集 menu, エディター起動
- First, choose 該当項目すべて select a content filename (we use
dic-gaiji.txt) and wait for the contents to be dumped to a text file.
This will take time (use ctrl-alt-del to kill notepad when it
automatically opens after it)
- Second, go back to this menu option again and select 該当項目の見出し, select
an index filename (we use dic-idx-gaiji.txt) and again kill notepad
when it opens
You can use a better text viewer like notepad++ to check the contents
of the file and guess whether it has worked correctly. Note that for
shineiwadai the index dumping step has to be used with a later version
than 2.30 (such as 2.66), which will need to be filtered below in a
slightly different way. Stick to 2.30 for the contents.
2. Filter gaiji character codes
Copy the two text files into the pdicconv directory and make sure they
are named dic-gaiji.txt and dic-idx-gaiji.txt for the content and index
files respectively. Skip this step if your dictionary has no gaiji in
it.
- Run the appropriate gaijirep-.bat file (eg. gaijirep-kenkyusha.bat)
to filter dic-gaiji.txt.
- Then run the gaijirep-idx.bat file (eg. gaijirep-idx-kenkyusha.bat)
to filter the dic-idx-gaiji.txt file
These steps will take some time to run and produce clean output text
files called dic.txt and dic-idx.txt. Note that if you used a version
of DDWIN later than 2.30 for eg. the index file in the case of
shineiwadai, please use the gaijirep-idx-shineiwadai260.bat file to
perform the gaiji filtering for the index file only.
3. Convert to 1-line text format
Run the appropriate pdicconv-.bat file (eg. pdicconv-shinwaeidai.bat)
to output a 1-line text format "pdic.txt" from the filtered files
dic.txt and dic-idx.txt.
4. Convert to PDIC dictionary .dic file
Run PDIC, goto Tools menu, select 辞書の変換
- In the first field, select 1列テキスト形式, and select the pdic.txt file you
made
- In the second field, leave it as PDIC形式 and choose a filename you
want for your PDIC .dic file
All the other default options should be ok.
Click ok, untick the box in the next dialog if you want to skip any
compression pdic tries to apply to the file, click ok etc and it should
be done.
You can now add your new .dic file to your dictionary group. You can
delete all the .txt files created in the process when you are finished,
and as well as the pdicconv package.
Yes that's the end of the instructions.
5. Notes on other dictionaries and customisation
If your dictionary is not listed try something which sounds close
i.e. shinwaeidai for Japanese-English
dictionaries
shineiwadai for
English-Japanese
Or just use the generic pdicconv.bat
Please feel free to make any requests to me it is really very easy to
include other formats.
Epwing to Text
----------------
The index creation from DDWIN is not perfect for pdic use. There are
gaiji/sublookup/mismatch problems.
An alternative way of index creation is to try and open an epwing
catalogs file directly in PDIC (note that I've never been able to get
it to generate anything other than an index of word lookups), convert
it to
text format and edit from there. Although this way there is a high
chance of the lookups not being in the original order, which it should
for this program to work reliably
Gaiji
-----
The gaiji replacement data are listed for each dictionary type
respectively in the gaiji-.txt files. Note that they are not complete
or perfect. They are merged with alphakan.txt (convert full-width latin
to half-width) to create a gaiji.txt file during the gaijirep-.bat
process.
You can call gaijirep.pl and gaijirep-idx.pl etc directly to take input
from gaiji.txt. All the perl scripts must be called with jperl
(included in the package or downloadable at
http://homepage2.nifty.com/kipp/perl/jperl/). You can add/edit whatever
gaiji replacements you like. Use the generic gaijirep.bat files to
process gaiji.txt directly. A few gaiji replacement tables found in
various places on the net are included in the package for your
reference.
You can delete all gaiji codes without going through the gaijirep
process with an option in pdicconv.pl shown below.
pdicconv.pl
-----------
You can call pdicconv.pl directly to merge dic.txt and dic-idx.txt into
an output file.
Usage:
jperl pdicconv.pl -t <dic type> -gaijidel (optional) -o
<output file> (optional)
<dic type>
shinwaeidai-kana
(j-e
kenkyusha kana index)
shinwaeidai-kanji
(j-e kenkyusha kanji index)
shineiwadai
(e-j
kenkyusha)
examples
(edict
examples dic)
kanjidic-fpw
(edict
kanjidic freepwing
nofilter
(attempt generic conversion)
-gaijidel
Will delete all gaiji [xxxxx] tags
<output file>
Name of output file you want
There are a few flags with comments at the start of this file which you
can edit also.
For Japanese-English dictionaries you will need to run this twice, once
for a kana-lookup output and once for a kanji-lookup. You can then
merge the two output files together for conversion in PDIC (see the
pdicconv-shinwaeidai.bat file for an example).
There is a maximum entry constraint in classic PDIC versions of ~12KB
for unicode and perhaps less for PDICW32. PDICCONV Will split up
entries which exceed ~8KB by appending #2, #3 etc to the end of word
look-ups so that no entries are dropped.
All the string replacement done is very simple and can be customised by
inspection of the code.
Please search the web for documentaion on the pdic 1-line text file
format, it is very simple. Lots of conversion references here:
http://kazuo.fc2web.com/dic/ddwin2.htm
----
Thanks to everyone who has shared information on the format on the web.
Part of the gaiji conversion code was adapted from Kissui Shimotsuki's
kojien converter (http://www.ikushimo.com/).
Thanks very much for trying out my software. Feel free to ask any
questions you like or leave a comment on my Japanese computing site.
日本語でももちろんOKだからですね^^;
http://diamondsky.org/other/japanese
-islisis
( evaunit1
(at) diamondsky.org )
ewfewf: sugoidesu!
(05.03.2008, 00:54)
Dudley Moon: Hi,
Question: Can your EPWING to EDICT programs convert Peter Rivard's EPWING version of Eijiro with yomigana to EDICT format?
Background:
I come to Japan each year for my college and I am trying to 1) learn more Japanese and 2) convert my Palm Tx to a really useful denshi jisho. (I have a WordTank 80 but it is still beyond my level to use efficiently).
I have been getting help from Peter Rivard (he has been a terrific helf) and he has a new EPWING version of Eijiro with yomigana added. I have downloaded EDICT and am very impressed with it. On the other hand, the EPWING reader for Palm that Peter listed (Buckingham Reader) does not seem to wotk well for me (i.e., I can not get it use newly downloaded dictionaries). Which leads me to a second question:
Question 2: Is there a better EPWING reader available for Palm OS version 5.4.9 ?
My system is Handheld: Palm Tx, Palm OS version 5.4.9 and I have CJKOS loaded as a Japanese language program. My laptop is a Tablet PC using Windows XP Pro with Japanese language enabled.
Thank you in advance for any and all help you can provide.
(28.07.2008, 11:40)
islisis: Q1:
Sure, (assuming you mean pdic not edict) I don't see why not. It looks like it's in the same format as the J-E New Collegiate. However, as he is deciding to charge more than even the major Japanese publishers do and for a dictionary he did not even write, I can't attempt to do so directly myself. Go through the readme - you can skip the gaiji replacement step as i don't think there are any special characters in eijiro. Just create the dic.txt and dic-idx.txt files directly from the ddwin output. Then try running pdicconv-shinwaeidai.bat to format to 1-line text, and finally the converter in pdic's tools menu.
If it doesn't work, mail me whatever details you can of what went wrong. Good luck!
Q2:
Sorry I'm not actually familiar at all with Palm software!
On a side note, his dictionary looks like a great idea, but it is not his work, being most likely machine translated through freely available software, and something so basic should be made available through the Eijiro homepage for global maintenance. I use pdic's shortcut keys to switch between kana and kanji entries (see tips above) to avoid doubling the size of the dictionary file, but if people wanted it I might create some tools to convert Eijiro to a yomigana-style dictionary for free.
(28.07.2008, 12:55)
JimR: This may be a strange question, but would you have any idea on how to make the OPPOSITE conversion? I have some PDICT files that I want to convert to Epwing; I have a Zaurus PDA and there don't seem to be any PDICT viewers for the Zaurus anymore.
Unless you know of one?
Thanks!
(21.08.2008, 22:29)
islisis: Sorry, how does a viewer disappear from platform? There are always mirror sites, or did you replace the os... at any rate, you could always compile from source! zpdview was a famous one, wasn't it?
http://zaurus.biojapan.de/zpdview.html
As for your original question, it's not strange at all, it's easy! There are tons more solutions for converting pdic to epwing, and much easier. ebstudio for instance
http://www31.ocn.ne.jp/~h_ishida/EBStudio.html
plenty of other ones in the last pdic conversion link given above
(yes they are in japanese, isn't that why you are here? :P i'm sure if you search you can find english guides...)
(21.08.2008, 23:07)
Comments temporarily disabled.