From 4b500bd4c69b481a611a61e72795c450120a6a7c Mon Sep 17 00:00:00 2001 From: Tom Smeding Date: Sat, 6 Jul 2024 23:12:16 +0200 Subject: More stuff --- README.txt | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) create mode 100644 README.txt (limited to 'README.txt') diff --git a/README.txt b/README.txt new file mode 100644 index 0000000..c93cbe1 --- /dev/null +++ b/README.txt @@ -0,0 +1,16 @@ +# Database preparation process + +Download the main database from: + http://www17408ui.sakura.ne.jp/tatsum/database.html +which is this file: + http://www17408ui.sakura.ne.jp/tatsum/database/VDRJ_Ver1_1_Research_Top60894.xlsx + +Then from the actual database sheet (sheet 5), take the columns: + lexeme, orthography, reading, part-of-speech (currently unused), "corrected frequency" + +Put the result in a CSV (say "database.csv") with 5 columns. It can be +ascertained that the data from the spreadsheet does not contain commas in the +selected columns, so the CSV conversion is safe. + +Then + $ cabal run process-database.hs -- database.csv -- cgit v1.2.3-70-g09d2