Getting started with Machine Learning: Environment Setup and Random Forest Intuition


DATE: APRIL 11, 2023

About Workshop:

The use of AI/machine learning in daily work is now commonplace. Throughout the sector, practitioners use machine learning to accomplish a slew of important tasks, including optimizing design, predicting maintenance efforts, estimating project budgets, organizing workforce schedules, validating product specs, and cutting costs across their organization. Machine learning drives this usage, and yet most practitioners are not aware of the fundamentals of how machine learning works. This is dangerous, as machine learning methods can be easily misapplied, and AI projects can easily result in wasted resources. 

Without a clear understanding of the requirements for the successful deployment of machine learning, organizations routinely allocate too many incorrect resources, leading to ballooning costs while decreasing the probability of executing the value case. In addition, third-party vendors abound, some of which represent solutions fitting the particularities of opportunities within the firm, and some of which do not. While the potential of leveraging machine learning to create value and reduce cost is undeniable, navigating this dynamic landscape requires an understanding of basic machine learning principles and best practices. 
This workshop explores machine learning principles and best practices builds an intuitive and memorable understanding of two common machine learning methodologies and provides a walkthrough of how to download and use related open-sourced machine learning libraries. 

The first part of the workshop will explore the fundamentals of machine learning and how they evolved to be what they are. Armed with an understanding of what makes machine learning effective, attendees will then explore the general landscape of who is doing what in machine learning, comparing the competitive advantages of the different categories of firms. Next, attendees will arrive at several philosophies and best practices resulting from these fundamentals and from this landscape. The goal here is to inform attendees on how to take advantage of the value offered by machine learning without stumbling into the associated pitfalls. 

The second part of the workshop will explore two common machine learning methods: Simple Linear Regression (Ordinary Least Squares, OLS) / Multiple Linear Regression along with Random Forest. If you’ve ever used Excel to fit a line to two-dimensional data, you’ve used Simple Linear Regression! Excel does it for us, but as the complexity of our data and our particular use case grow, it becomes increasingly important to understand and control the machine learning methods associated variables, called hyperparameters. We’ll use Excel to walk through how Simple and Multiple Linear Regression arrive at a best fit line for data samples of different sizes. Then, participants will derive the intuition for Random Forest and use excel to carry out a use case of applying Random Forest to data.
In the third part of the workshop, participants will be able to download and install Python libraries allowing for the exploration of these and other Machine Learning methods. In this workshop, we will be using macros (Visual Basic scripts) in Excel to execute some Machine Learning examples. We will also be using Anaconda as an environment to run Python. Please ensure that the machine you bring to the workshop has permission to download Anaconda (you can try it here) and can run macros in Excel (ask your administrator). If not, you’ll still be able to follow along with the theory, but your ability to participate in the hands-on component of the workshop will be diminished. 

Who Can Attend?

-    Leaders looking to build intuition on an important topic of our era
-    Decision-makers interested in building perspective to better define project criteria
-    Direct contributors looking for guidelines as they explore new tools with which to boost their productivity

Benefits of Attending

•    Understand the purposes and limitations of machine learning in the world today. This is useful if you plan on weighing in on whether machine learning can be helpful in solving the problems you are presented with. 
•    Know the contemporary machine learning landscape. This is useful if you plan on working with external organizations such as cloud service providers, data vendors, data engineering consultants, or AI companies. 
•    Tell the difference quickly between a proposed AI solution that will be able to add value and one that will not. 
•    Allocate resources effectively to maximize value from AI/machine learning while minimizing costs.
•    Leverage an intuition on two commonly used machine learning methods. This is useful if you plan on being involved in the successful execution of AI or want to ensure that those who are so tasked are performing adequately. 
•    Use basics in Python (the world’s favorite data science tool) and some popular machine learning libraries. 

You’ll also be given a free Random Forest tool you can keep and use to explore data as well as execute machine learning and AI directly in Excel.  
Key Topics – Proper and Practical use of AI/Machine Learning
Organizations’ slow adoption and misuse of machine learning enables an opportunity to stand out in the contemporary landscape as a firm, as a leader, or as an employee.

Workshop Agenda

9:00-10:40 Section 1: Machine Learning Fundamentals and Market
•    What machine learning is, and what the requirements are for it to work.
•    A brief history of machine learning, including what makes it popular and effective today.
•    What the different categories are of machine learning, and when each should be used.
•    A who is who in AI/machine learning: who stands to succeed in what parts of the market and why.
•    Rules of thumb about scoping machine learning projects and determining with which providers to work.

10:40-11:00 Networking Break

11:00-12:30 Section 2: Two Machine Learning Methods Explained
•    How to use Excel to fit a line to 2D data.
•    How we can create our own Simple Linear Regression algorithm (using Ordinary Least Squares, OLS) to reproduce the results in Excel.
•    How things begin to get more complex with additional independent variables, and the need for Multiple Linear Regression.
•    Basic data engineering in preparation for machine learning. 
•    Random Forest intuition and resulting hyperparameters to control.
•    How to use the Random Forest Excel tool to explore and understand complex data. 

12:30-1:30 Lunch

1:30-3:00 Section 3: Introduction to Machine Learning in Python
•    How to install a Python development environment on your machine.
•    Rudimentary introduction to using Python within a standard lab-integrated development environment (IDE).
•    How to download, install, and load machine learning and other libraries in Python.
•    Introduction to how to execute machine learning libraries professionally in Python.

3:00 End Workshop



Alec Walker