Project for Squeak 5.3 (and above) to explore and learn about the history of Squeak and Squeak-related projects.
-
Get a Squeak image (5.3 or newer).
-
Load the Squeak History Project:
Metacello new baseline: 'SqueakHistory'; repository: 'github://hpi-swa/squeak-history/packages'; load.
After the code was loaded, mailing list archives will be downloaded. See BaselineOfSqueakHistory >> #loadData for details.
The SqhMailmanAggregator can be used to enumerate all mail messages. Example queries can be found in the queries protocol. The message cache holds meta-data for each message in the image, that is, instances of SqhMailWrapper. The message body has to access the archive's file contents on disk, which is slower. Here is an example query, which requires disk access:
countMessageLines
| count |
count := 0.
self messagesCachedDo: [:wrapper |
count := count + wrapper mailMessage bodyText lineCount].
^ countNote that you should also derive some rules for author-key normalization to further improve the overall quality of query results. Just run this:
SqhMailmanAggregator new
showProgress: true; "optional"
deriveRulesForAuthorKeyNormalization. "ignore the warning"For more information on normalization, see below.
There are rules to normalize different kinds of information: author names, timestamps, and mail addresses. The goal is to identify contributors and, eventually, relevant discussions. Hand-selected rules can be found in SqhMailmanAggregator >> #rulesForAuthorKeyNormalization and #rulesForAuthorKeyClarification. Here is an excerpt:
"rulesForAuthorKeyNormalization"
'alankay' -> 'alancurtiskay'.
'hhirzel' -> 'hanneshirzel'.
"rulesForAuthorKeyClarification"
'[email protected]' -> ('squeakdev' -> 'alanreider').
'[email protected]' -> ('squeak' -> 'nedkonz').Then, there is a simple algorithm to derive more normalization rules using e-mail addresses as identifier. If two messages came from the same address, then the author's names can be used to identify the same contributor. There is also a check to avoid mapping cycles. See SqhMailmanAggregator >> #deriveRulesForAuthorKeyNormalization for details. We filter generic addresses such as:
Note that there is also a (hand-crafted) list of generic author names (or keys) in #genericAuthorKeys including github or travisci.