This guide provides instructions on how to set up and use AWS instances to run a MapReduce framework and submit jobs.
The following section is already completed and I have a cluster of EC2 instances running
/mnt/efs/mapreduce/input
.wc_map.sh
(Word Count Mapper)wc_reduce.sh
(Word Count Reducer)Use the scp
command to securely copy input files to the appropriate directory on the instance.
scp -r -i {YOUR_SSH_KEY} tests/testdata/input ubuntu@{PUBLIC_DNS_ADDRESS}:/mnt/efs/mapreduce/input
Run the mapreduce-submit command to execute a job. Below is the general syntax:
mapreduce-submit \
--host {MANAGER_PUBLIC_IP_ADDRESS} \
--port 6000 \
--input /mnt/efs/mapreduce/input \
--output /mnt/efs/mapreduce/output \
--mapper /mnt/efs/mapreduce/wc_map.sh \
--reducer /mnt/efs/mapreduce/wc_reduce.sh
MANAGER_PUBLIC_IP_ADDRESS: ec2-18-219-203-248.us-east-2.compute.amazonaws.com
Technologies: Python, Madoop (Custom version of Hadoop)
If you would like to run your own instances to view the output from each job, the code for this project is available upon request.