top of page

Things I learned about Random Forest Machine Learning Algorithm

  • Sabina Pokhrel
  • Aug 18, 2019
  • 3 min read

An Overview of Lesson 1: Introduction to Random Forest of Machine Learning Course from Fast.ai


On a meetup that I attended a couple of months ago in Sydney, I was introduced to an online machine learning course by fast.ai. I never paid any attention to it then. This week, while working on a Kaggle competition, and looking for ways to improve my score, I came across this course again. I decided to give it a try.


Here is what I learned from the first lecture, which is a 1 hour 17 minutes video on INTRODUCTION TO RANDOM FOREST.


While the main topic of the lecture was Random Forest, Jeremy (lecturer) gave some general information and tips and tricks of using Jupyter Notebook.


Screen with code (Photo by Shahadat Shemul on Unsplash)

Some of the important things that Jeremy talked about is that data science is not equivalent to software engineering. In data science, we prototype models. While software engineering has its own set of practices, data science have their own set of best practices as well.


Model building and prototyping needs an interactive environment and is an iterative process. We build a model. Then, we take steps to improve it. And repeat until we are happy with the result.


Random Forest

Photo of Forest (Photo by Sebastian Unrau on Unsplash)

I had heard about the term Random Forest and I knew that it is one of the machine learning techniques that exist, but to be honest, I never bothered to go and learn more about it. I was always keen to know more about deep learning techniques instead.


From this lecture, I learned that Random Forest is indeed AWESOME.


It is like a universal machine learning technique that can be used for both regression and classification purpose. That means you can use Random Forest to predict the stock price as well as to categorise given samples of medical data. 


In general, a Random Forest model does not overfit and even if it does, it is easy to stop it from overfitting.


There is no need for a separate validation set for a Random Forest model.


Random Forest has only a few if any, statistical assumptions. Neither does it assume that your data is normally distributed, nor it assumes that the relationships are linear.


It required very few pieces of feature engineering. 


Thus, if you are new to machine learning, it can be a great place to start.


Other Concepts


Curse of dimensionality is a concept that the more features of a data you have the more spread out the data points will be. This means that the distance between two points is much less meaningful. 

Jeremy assures that, in practice, this is not the case and in fact, the more features your data have, the better it is for training a model.


No Free Lunches theorem is a concept that there is no model that works perfectly for any kinds of data.


Tips and Tricks

Little girl doing a magic trick (Photo by Paige Cody on Unsplash)

1. You can bash command in Jupyter Notebook by including ! in front of the command.

!ls
!mkdir new_dr

2. A new way to append string in Python 3.6.

name = 'Sabina'
print(f'Hello {name}')
no_of_new_msg = 11
print(f'Hello {name}, you have {no_of_new_msg} new messages')

3. Learn about a python function without leaving the Jupyter Notebook. Use ? in front of the function name to get its documentation.

from sklearn.ensemble import RandomForestClassifier
?RandomForestClassifier.fit()

4. If you want to read the source code, you can use double ? in front of the function name.

from sklearn.ensemble import RandomForestClassifier
??RandomForestClassifier.fit()

5. Save processed dataset by using to_feather method to save the dataset to disk in the same format that it is stored in RAM. You can read data back from the saved file using read_feather method. Note that you will need to install feather-format library in order to use these methods.

import pandas
df = pd.DataFrame()
df.to_feather('filename')
saved_df= pd.read_feather('filename')

Here is a link to the lecture video if you are interested in going deep into the topic.


Leave your thoughts as comments below. If you do check out the course, let me know how you go.


 
 

© 2019 by Suchi Tech Pvt. Ltd.

bottom of page