Chevron Left
Back to Building Machine Learning Pipelines in PySpark MLlib

Learner Reviews & Feedback for Building Machine Learning Pipelines in PySpark MLlib by Coursera Project Network

55 ratings

About the Course

By the end of this project, you will learn how to create machine learning pipelines using Python and Spark, free, open-source programs that you can download. You will learn how to load your dataset in Spark and learn how to perform basic cleaning techniques such as removing columns with high missing values and removing rows with missing values. You will then create a machine learning pipeline with a random forest regression model. You will use cross validation and parameter tuning to select the best model from the pipeline. Lastly, you will evaluate your model’s performance using various metrics. A pipeline in Spark combines multiple execution steps in the order of their execution. So rather than executing the steps individually, one can put them in a pipeline to streamline the machine learning process. You can save this pipeline, share it with your colleagues, and load it back again effortlessly. Note: You should have a Gmail account which you will use to sign into Google Colab. Note: This course works best for learners who are based in the North America region. We’re currently working on providing the same experience in other regions....

Top reviews

Filter by:

1 - 9 of 9 Reviews for Building Machine Learning Pipelines in PySpark MLlib

By Andrés M

May 7, 2021

I never write reviews, but ... please DON´T TAKE THIS PROJECT. Terrible project, no theoretical explanation, no explanation of functions, no complete project using pipelines (only 2 lines). The installation of libraries is not correct, I spent about two days trying to install them. Poor English language, zero explanations in general. It's a shame that people with high level of education are trying to scam people, I regret 100% of paying for this. I DID NOT LEARN ANYTHING.

By Jeremy S

Jan 26, 2022

This project gives a good overview of the basic commands of PySpark, as well as a pretty decent glimpse of the functions and methods in MLlib. It will bring you through the methods required to split your training/validation/testing data, train the model, cross validate it, and evaluate it. There is also a brief section on cleaning data, though the majority of the lesson is focused on the random forest pipeline. Note that this project will not teach you machine learning basics or Random Forest techniques whatsoever. Similarly, this project will not teach you the basics of databases or SQL. If your sole intention is to learn the MLlib of Pyspark, then this course is good, but don't expect more. Finally, the project is contained on Coursera's Python notebook service, Rhyme. This course will not show you how to set up Pyspark or any of the supporting installations or environment variables on your own computer. Similarly, you will not be able to easily download the dataset used in this project. Even "downloading" the dataset only downloads it to the virtual machine running Rhyme, though the VM has access to the internet (hint hint). For me, this project was worth the $10 USD I paid.

By Aruparna M

Feb 21, 2021

The dataset provided was wrong. It was not the exact one that was demonstrated by the instructor!

By 19BST035-HARI K R B B C

Sep 25, 2020

This Course is Very useful. This course big advantage is short. Read short, Learn Big.

By Cheikh B

Mar 27, 2021

Awsome project and very good explaination thank you for this project

By Leonardo E

Nov 21, 2020

pretty useful, actually.


Oct 5, 2020

helpful project

By Sankirna J

May 2, 2022

Good project to get you started with some spark code. It was very short and hands on. Instructor doesn't really talk about the library or how things work. We are expected to follow the guidelines in the companion video and basically replicate the code the instructor provides which is very clear and concise.

I would recommend this project as a quick hands-on but I did not feel like I learned a lot from it because of the very simple guided nature.

By Max B

Dec 3, 2022

First cell gives error, following fix on discussion forum did not resolve the error, so gave up.