aboutsummaryrefslogtreecommitdiff
# Database preparation process

Download the main database from:
  http://www17408ui.sakura.ne.jp/tatsum/database.html
which is this file:
  http://www17408ui.sakura.ne.jp/tatsum/database/VDRJ_Ver1_1_Research_Top60894.xlsx

Then from the actual database sheet (sheet 5), take the columns:
  lexeme, orthography, reading, part-of-speech (currently unused), "corrected frequency"

Put the result in a CSV (say "database.csv") with 5 columns. It can be
ascertained that the data from the spreadsheet does not contain commas in the
selected columns, so the CSV conversion is safe.

Then:
  $ ./process-database.hs database.csv database.bin
creates the indexed database file that is read by the vim script.