Shinyubin

New Japanese ZIP code dictionary for Kotoeri

2009.02.23
Shinyubin screenshot
That's all there is to it.

What is Shinyubin?

Shinyubin is a Kotoeri dictionary that matches Japanese ZIP codes to their addresses. That means that if you input, for instance, "2120023," you will get "神奈川県川崎市幸区戸手本町," which is exactly the location that that ZIP code defines according to the Japanese Post Office. If you need to input a lot of Japanese addresses, this dictionary could be very helpful to you.

What is Kotoeri? Kotoeri is the standard Japanaese input method for Mac OS X. This dictionary will not work on Windows, or with other input methods like ATOK.

Random trivia: The "ZIP" in "ZIP code" stands for "Zone Improvement Plan" (source: Oxford American Dictionary). It's an acronym, so it should be capitalized!

Product Rating Downloads  
Shinyubin add a comment

Introduction

When I recently updated a friend's PC with Windows Update, a ZIP code dictionary for the Windows Japanese IME showed up among the various patches. I recalled that there used to be something similar for Mac OS.

A quick search revealed that indeed, there is. It's one made by zeke, but its data is from H11 so it is quite out-of-date at this point. Even my current address isn't listed!

Searching some more, I came across ZIP code data, and upon looking at the contents of the file found that it wasn't too complex, so I decided to take up the challenge of creating my own ZIP code dictionary.

After much hard work, I came up with a decent dictionary and a toolset for converting the Japanese Post Office's data files. I'd like to share these with you.

Usage

Dictionary only

  1. Download the disk image shinyubin.dmg below and open it
  2. From the disk that appears on your desktop, copy the file "新郵便番号辞書" to ~/Library/Dictionaries (Home → Library → Dictionaries)
  3. In the Kotoeri input menu select "Kotoeri Preferences", and then in the "辞書" section, highlight "新郵便番号辞書" and click the "開く" button.

Datafile conversion tool Zipcodic

  1. Download Zipcodic.sh below
  2. Download and expand a datafile. It can be either "oogaki" or "kogaki." You can use the prefecture-specific ones, but you probably want the nationwide "全国一括" file. (Note: Zipcodic 1.1 and later also supports the large office data.)
  3. Open a Terminal and cd to the folder containing the script
  4. Give yourself execute permissions on the script with chmod u+x Zipcodic.sh
  5. If you've put the script and the (nationwide) datafile in the same folder, you execute the script like this: ./Zipcodic.sh KEN_ALL.CSV
  6. When the processing has completed, the number of lines (1 line = 1 address) will be reported. Take note of this number.

Take the resulting file KEN_ALL.txt and import it into the Kotoeri Word Register application "ことえり単語登録."

  1. Choose "Register Words" in the Kotoeri input menu
  2. In the "辞書" menu, choose "新規ユーザ辞書の作成"...
  3. ...and in the new dictionary you just created, import the converted data by choosing "テキストや辞書から取り込む…" in the "辞書" menu and then selecting KEN_ALL.txt
  4. Confirm that the number shown next to "登録されている語数" matches the number reported in step five above

If the data was imported properly then you're done. But for some reason for me the import often mysteriously fails partway through. To make things easier, I created the helper tool ZipcodicHelper, which you can download below.

  1. Export your partially-imported dictionary by choosing "テキストに書き出す…" from the "辞書" menu, and save it as, for instance, partial.txt in UTF-16 format.
  2. As above, download ZipcodicHelper.sh from below and place it in the same folder as partial.txt and KEN_ALL.txt.
  3. Navigate to this folder with cd, get execution privileges with chmod u+x ZipcodicHelper.sh, and then execute the following command: ./ZipcodicHelper.sh KEN_ALL.txt partial.txt
  4. This will create a file called missing.txt, which you should now import into ことえり単語登録.
  5. Repeat steps 1 through 4 as necessary. Optionally, you can append --split to the end of the command in step 3; this splits the remaining data into a number of 250 line-long files.

Caveats

In the original data, there are lines like this (unnecessary fields removed):

"8996601","鹿児島県","姶良郡牧園町","万膳(1400〜1477番)"
"0287901","岩手県","九戸郡種市町","第8地割〜第14地割(玉川、戸類家)"
"0882686","北海道","標津郡中標津町","2253−24、2253−35、2253−63)"
"7614103","香川県","小豆郡土庄町","甲、乙(大木戸)"
"4780000","愛知県","知多市","以下に掲載がない場合"
"8710099","大分県","中津市","中津市の次に番地がくる場合"

These are corrected as below:

"8996601","鹿児島県","姶良郡牧園町","万膳"
"0287901","岩手県","九戸郡種市町"
"0882686","北海道","標津郡中標津町"
"7614103","香川県","小豆郡土庄町"
"4780000","愛知県","知多市"
"8710099","大分県","中津市"

In other words, if the data is not unique at the 町域 (the last field) level, that data is thrown out. I figured this approach would result in the most useful dictionary. However, if you disagree with this method, or if you find any mistakes, please send me an email.

Shinyubin includes the large office data. For this data, I drop the "大口事業所等名" field and use only the address.

Note: This dictionary come with no warranty or guarantees of any kind. Use it at your own risk.

Zipcodic and ZipcodicHelper are distributed under the GNU General Public License. For more details, please see "License" below. As for the dictionary, the original data is distributed by the Japanese Post Office, so as is written here, "About usage, redistribution, porting, and improvements: The Japanese Post Office claims no copyright [on this data]. You may distribute it freely."

Around town

Download

Amazon Honor System Click Here to Pay Learn More Shinyubin icon

Current version of Shinyubin (H20-05-30):

Datafile conversion tools

Change log

Show entire history

Acknowledgments