- This repository contains "Korean Coreference Resolution model".
- This model is based on Kenton Lee's English Coreference Resolution model. And we applied it to Korean Coreference Resolution with referencing Shin et al.
- Install python3 requirements:
pip3 install -r requirements.txt - Build custom kernels by running
setup_all.sh.- There are 3 platform-dependent ways to build custom TensorFlow kernels. Please comment/uncomment the appropriate lines in the script.
- Download
word2vec.txtand save at here.
./train_coref.shpreprocesses data before train.- Experiment configurations are found in
experiments.conf - Choose an experiment. Change paths of data, word embedding and other parameters which you would like.
- Training:
python3 train.py <experiment> - Results are stored in the
logsdirectory and can be viewed via TensorBoard. - Evaluation:
python3 evaluate.py <experiment>
logsdirectory have a pretrained model,MTA02-test.MTA02-testis a pretrained model of crowdsourcing data set.
- If you want to use pretrained
ELMo embedding, download it in the input directory.
- The training terminates automatically at 30k steps. The model generally converges at about 25k steps.
- If there are some errors when evaluating the development set,
v4_gold_conllfile may have errors. So, you should change the train, dev. set path ofverify_conll.pyand run it. Then, you may find some errors and fix them.- Most of these kind of errors are caused by
ETRI morphological analysis.
- Most of these kind of errors are caused by
- Kenton Lee et al., Higher-order Coreference Resolution with Coarse-to-fine Inference in NAACL 2018
- Shin et al., Korean Co-reference Resolution End-to-End Learning using Bi-LSTM with Mention Features in HCLT 2018
CC BY-NC-SAAttribution-NonCommercial-ShareAlike- If you want to commercialize this resource, please contact to us
Machine Reading Lab @ KAIST
This work was supported by Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government(MSIT) (2013-0-00109, WiseKB: Big data based self-evolving knowledge base and reasoning platform)