This is a simple implementation of a Map Reduce framework for counting words in multiple files
This is currently implemented for easy use with a single computer,
but can be extended to work with multiple computers with some tweaks.
To run the Map Reduce, do not forget to:
-
Change the filepaths in the
main.gofile to the files you want to process. -
Tweak the numbers of nodes (N1) on the first layer of the Map Reduce in 'main.go'
-
Tweak the numbers of nodes (N2) on the second layer of the Map Reduce in 'main.go'
This implementation is communicating through TCP sockets with the following messaging protocol:
The format is:
{sender_id} {message_type} {optional_message}
- if the message_type is "0", the message is a list of "{word1} {count1} {word2} {count2} ..."
This is used by the clients to send the word and the count to the first nodes.
- if the message_type is "1", there is no message.
This is used by the clients to mark the end of their communications
- if the message_type is "2", the message is a tuple "{min} {max}"
This is used by the nodes to send the min and max values to the other nodes
- if the message_type is "3", the message is a list of "{word1} {count1} {word2} {count2} ..."
This is used by the first nodes to send the word and the count to the other nodes
- if the message_type is "4", there is no message
This is used by the first nodes to mark the end of their communications