Utilizing in-network coding to trade extra computations for more communication bandwidth in a distributed sorting algorithm (TeraSort). The paper for this project is available at https://arxiv.org/abs/1702.04850.
- C/C++ compiler (g++)
- OpenMPI library
A file containing data to be sorted must be placed in the input directory. Note that the format of the data points follows standard TeraSort input data. Each record contains a 10-byte key and a 90-byte value. An input file can be generated by TeraSort Example.
Specify in Configuration.h:
numReducer: number of distributed computing nodesinputPath: a path to the input file
Run make to compile TeraSort.
Run ./Splitter to split the input data points.
Run mpirun -np 4 ./TeraSort.
The above execution creates 3 computing processes and 1 master process to sorts data according to TeraSort algorithm. All processes are local.
Specify in CodedConfiguration.h:
numReducer: number of distributed computing nodesload: number of nodes on which each data point is processed (computation load)numInputis set to be equal to (numReducerchooseload)inputPath: a path to the input file
Run make to compile CodedTeraSort.
Run ./Splitter code to split the input data points.
Run mpirun -np 4 ./CodedTeraSort.
The above execution creates 3 computing processes and 1 master process to sorts data according to CodedTeraSort algorithm. All processes are local.