aboutsummaryrefslogtreecommitdiff
path: root/README.txt
blob: c93cbe1afd50a3ead580da7174d13fcce185c2a3 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Database preparation process

Download the main database from:
  http://www17408ui.sakura.ne.jp/tatsum/database.html
which is this file:
  http://www17408ui.sakura.ne.jp/tatsum/database/VDRJ_Ver1_1_Research_Top60894.xlsx

Then from the actual database sheet (sheet 5), take the columns:
  lexeme, orthography, reading, part-of-speech (currently unused), "corrected frequency"

Put the result in a CSV (say "database.csv") with 5 columns. It can be
ascertained that the data from the spreadsheet does not contain commas in the
selected columns, so the CSV conversion is safe.

Then
  $ cabal run process-database.hs -- database.csv