از پارسکدرز بیشترین بهره را ببرید و رویای کاری خود را زندگی کنید.
یک سال پیش منتشر شده
تعداد بازدید: 332
کد پروژه: 435958
شرح پروژه
You have two datasets: Trips.txt which records trip information, and Taxis.txt which is about taxi information. Both Trips.txt and Taxis.txt are stored on HDFS. Complete the following MapReduce programming tasks with Python.
A sample of Taxis.txt A sample of Trips.txt
Assessment Type
− Individual assignment.
− Submit online via Canvas → Assignment 1.
− Marks awarded for meeting requirements as closely as possible.
− Clarifications/updates may be made via announcements or relevant discussion forums.
Taxi#, company, model, year
470,0,80,2018 332,11,88,2013 254,10,62,2018 460,4,90,2022 113,6,23,2015 275,16,13,2015 318,14,46,2014
Trip#, Taxi#, fare, distance, pickup_x, pickup_y, dropoff_x, dropoff_y
0,354,232.64,127.23,46.069,85.566,10.355,4.83 1,173,283.7,150.74,5.02,31.765,88.386,27.265 2,8,83.84,43.17,63.269,33.156,92.953,60.647 3,340,259.2,136.3,14.729,13.356,14.304,90.273 4,32,270.07,152.65,27.965,13.37,77.925,62.82 5,64,378.31,202.95,1.145,94.519,98.296,35.469 6,480,235.98,121.23,66.982,66.912,5.02,31.765 7,410,293.16,162.29,2.841,95.636,91.029,16.232
Task 1 (5 marks)
For each taxi, count the number of trips and the average distance per trip by developing MapReduce programs with Python. The program should implement in-mapper combining with state preserved across lines.
The code must work for 3 reducers. You need to submit a shell script named task1-run.sh. Running the shell script, the task is performed where the shell script and code files are in the same folder (no subfolders).
RMIT Classification: Trusted
Task 2 (10 marks)
You are asked to write a MapReduce program with Python to cluster trips in Trips.txt based on pickup locations. Your code should implement k-medoid clustering algorithm known as Partitioning Around Medoids (PAM) algorithm which is described below:
1. Initialize: randomly select 𝑘𝑘 of the 𝑛𝑛 data points as the medoids.
2. Assignment step: Associate each data point to the closest medoid.
3. Update step: For each medoid 𝑚𝑚 and each data point 𝑜𝑜 associated with 𝑚𝑚, swap 𝑚𝑚 and 𝑜𝑜, and
compute the total cost of the configuration (that is, the average dissimilarity of 𝑜𝑜 to all the data points
associated to 𝑚𝑚). Select the medoid 𝑜𝑜 with the lowest cost of the configuration
4. Repeatedly alternating steps 2 and 3 until there is no change in the assignments or after a given
number 𝑣𝑣 of iterations.
The code must work for 3 reducers, for different settings of 𝑘𝑘, and for different settings of 𝑣𝑣. Also, you should write up a shell script named task2-run.sh. Running the shell script, the task is performed where the shell script and code files are in the same folder (no subfolders). Note that 𝑘𝑘 and 𝑣𝑣 must be passed to task2-
run.sh as arguments when it is executed.
Task 3 (10 marks)
You are required to use what you learned so far to solve a slightly more advanced task. The task is to write a MapReduce program with Python to count the number of trips for each taxi company. Both Taxis.txt and Trips.txt will be used and they are stored on HDFS. The code must work for 3 reducers. Also, you should write up a shell script named task3-run.sh. Running the shell script, the task is performed where the shell script and code files are in the folder (no subfolders).
Note that task 3 should have two MapReduce subtasks where the first is a join operation and the second is a counting operation. The output of the first task is the input of the second task. The execution of the two subtasks should be specified in task3-run.sh. It is illegal to copy Trips.txt and/or Taxis.txt to the local machine and process them.
مهارت ها و تخصص های مورد نیاز
بودجه
750,000 تومان تا 5,000,000 تومان
مهلت برای انجام
3روز
وضعیت مناقصه
انجام شده
درباره کارفرما
عضویت یک سال پیش
قادر به انجام این پروژه هستید؟
مهلت ارسال پیشنهاد قیمت برای این پروژه تمام شده است
به رایگان یک حساب کاربری بسازید
مهارتها و تخصصهای خود را ثبت کنید، رزومه و نمونهکارهای خود را نشان دهید و سوابق کاری خود را شرح دهید.
به شیوهای که دوست دارید کار کنید
برای پروژههای دلخواه در زمان دلخواه پیشنهاد قیمت خود را ثبت کنید و به فرصتهای شغلی منحصر به فرد دسترسی پیدا کنید.
با اطمینان دستمزد دریافت کنید
از زمان شروع کار تا انتهای کار به امنیت مالی شما کمک خواهیم کرد. وجه پروژه را از ابتدای کار به امانت در سایت نگه خواهیم داشت تا تضمین شودکه بعد از تحویل کار دستمزد شما پرداخت خواهد شد.
میخواهید شروع به کار کنید؟
یک حساب کاربری بسازید
بهترین مشاغل فریلنسری را پیدا کنید
رشد شغلی شما به راحتی ایجاد یک حساب کاربری رایگان و یافتن کار (پروژه) متناسب با مهارتهای شما
است.
پیدا کردن کار (پروژه)
تماشای دمو روش کار