Want to be able to run evaluation (e.g., MRR) so that we can run on experiment output as well as tfidf output.