beacon-biosignals · ericphanson · Dec 14, 2020 · Dec 15, 2020 · Dec 15, 2020 · Dec 15, 2020
diff --git a/docs/make.jl b/docs/make.jl
@@ -3,7 +3,9 @@ using Documenter
 
 makedocs(; modules=[Lighthouse], sitename="Lighthouse",
          authors="Beacon Biosignals and other contributors",
-         pages=["API Documentation" => "index.md"])
+         pages=["API Documentation" => "index.md",
+                "Terminology" => "terminology.md",
+                "evaluation_metrics" => "evaluation_metrics.md"])
 
 deploydocs(repo="github.com/beacon-biosignals/Lighthouse.jl.git",
            devbranch="main")
diff --git a/docs/src/evaluation_metrics.md b/docs/src/evaluation_metrics.md
@@ -0,0 +1,28 @@
+# Evaluation metrics
+
+Lighthouse automatically generates a suite of evaluation metrics.
+Here, we briefly describe these. This page uses terms defined in [Terminology](@ref),
+so see that page for any unfamiliar words.
+
+## Confusion matrices
+
+Lighthouse plots confusion matrices, which are simple tables
+showing the empirical distribution of predicted class (the rows)
+versus the elected class (the columns). These come in two variants: 
+
+* row-normalized: this means each row has been normalized to sum to 1. Thus, the row-normalized confusion matrix shows the empirical distribution of elected classes for a given predicted class. E.g. the first row of the row-normalized confusion matrix shows the empirical probabilities of the elected classes for a sample which was predicted to be in the first class.
+* column-normalized: this means each column has been normalized to sum to 1. Thus, the column-normalized confusion matrix shows the empirical distribution of predicted classes for a given elected class. E.g. the first column of the column-normalized confusion matrix shows the empirical probabilities of the predicted classes for a sample which was elected to be in the first class.
+
+[insert example plot]
+
+## Inter-rater reliability
+
+## ROC curves
+
+## PR curves
+
+## PR-gain curves
+
+## Prediction-reliability calibration
+
+##
diff --git a/docs/src/terminology.md b/docs/src/terminology.md
@@ -0,0 +1,9 @@
+# Terminology
+
+* _sample_: a piece of data to be classified by the model, or a labelled piece of training/test/validation data.
+* _classes_: the set of possible class labels which the model attempts to predict.
+* _voters_: the individual sources of labelled data, such as human labellers. Each voter may supply a "vote" for a class label for a sample.
+* _votes_: the matrix of votes corresponding to a set of data, whose rows correspond to the index of a sample in a set of data, whose columns correspond to voters, and whose values are the indices of class labels (i.e. numbers in `1:length(classes)`). E.g. if 2 voters have voted on ten samples, then `votes` is a 10 by 2 matrix of integers. If a voter has not voted on a particular sample, any value outside `1:length(classes)` may be supplied to indicate this.
+* _elected class_: the class elected by the voters. By default in [`learn!`](@ref),
+    the elected class is chosen by a simple majority of the votes with ties broken randomly.
+* _predicted class_: the class predicted by the model for a given input.
diff --git a/src/learn.jl b/src/learn.jl
@@ -460,11 +460,9 @@ for class `i`.
 Where...
 
 - `predicted_hard_labels` is a vector of hard labels where the `i`th element
-is the hard label predicted by the model for sample `i` in the evaulation set.
-
+    is the hard label predicted by the model for sample `i` in the evaulation set.
 - `elected_hard_labels` is a vector of hard labels where the `i`th element
 is the hard label elected as "ground truth" for sample `i` in the evaulation set.
-
 - `class_count` is the number of possible classes.
 
 """
@@ -559,10 +557,9 @@ Only valid for binary classification problems (i.e., `length(classes) == 2`)
 Where...
 
 - `predicted_soft_labels` is a matrix of soft labels whose columns correspond to
-the two classes and whose rows correspond to the samples in the test set that have been
-classified. For a given sample, the two class column values must sum to 1 (i.e.,
-softmax has been applied to the classification output).
-
+    the two classes and whose rows correspond to the samples in the test set that have been
+    classified. For a given sample, the two class column values must sum to 1 (i.e.,
+    softmax has been applied to the classification output).
 - `votes` is a matrix of hard labels whose columns correspond to voters and whose
     rows correspond to the samples in the test set that have been voted on. If
 `votes[sample, voter]` is not a valid hard label for `model`, then `voter` will
@@ -881,8 +878,8 @@ of logged values, `\$resource` takes the values of the field names of
 Where...
 
 - `get_train_batches` is a zero-argument function that returns an iterable of
-training set batches. Internally, `learn!` uses this function when it calls
-`train!(model, get_train_batches(), logger)`.
+    training set batches. Internally, `learn!` uses this function when it calls
+    `train!(model, get_train_batches(), logger)`.
 
 - `get_test_batches` is a zero-argument function that returns an iterable
     of test set batches used during the current epoch's test phase. Each element of