Machine Learning & Predictive Analytics Boot Camp (PREDICTDATA)

Name: training4it.com
Address: 9913 Shelbyville Rd #200, Louisville, KY, 40223
Telephone: 502.265.3057

The stores of data relevant to our organizations, customers, operations, and goals have never accumulated at a faster pace or to a larger volume. Likewise, the need for intelligent data analysis has never been greater. Vast reserves of value hidden within huge and sophisticated data sets. It can be a challenge to find that value – but if we can tease out the insights and answers lurking within our information, they can be translated into a host of opportunities and advantages. With the right skills, only your own creativity limits how you can leverage your stores of data for better decisions, analytics, and prediction. Fortunately, today's data science methods are more practical and accessible than ever. The open-source R environment provides a straightforward yet incredibly powerful toolbox for performing useful predictive modeling and deep analysis. This hands-on machine learning course advances your data analysis skills into the realm of real-world data science. If you have a working familiarity with R, our three-day class equips you to go back to work with real-world predictive modeling and basic machine learning techniques. Led by expert data scientists, you will work in R to lay your data science foundation and learn techniques that allow you to leverage your data in sophisticated, powerful new ways.

Retail Price: $2,695.00

Next Date: Request Date

Course Days: 3

Request a Date

Request Custom Course

About this Course

Fortunately, today's data science methods are more practical and accessible than ever. The open-source R environment provides a straightforward yet incredibly powerful toolbox for performing useful predictive modeling and deep analysis. This hands-on machine learning course advances your data analysis skills into the realm of real-world data science. If you have a working familiarity with R, our three-day class equips you to go back to work with real-world predictive modeling and basic machine learning techniques. Led by expert data scientists, you will work in R to lay your data science foundation and learn techniques that allow you to leverage your data in sophisticated, powerful new ways.

Audience Profile

Intermediate level data analysts interested in expanding their data mining processes. We emphasize Data Foundation and Machine Learning concepts. All exercises are performed in R.

Prequisities

This machine learning course is for individuals intermediate data analysis skills and basic knowledge of descriptive statistics. Any experience with R is also beneficial.

Technical requirements: Installed R and some R packages. Installation of RStudio is helpful, but not required.

*Delivered by ASPE, ICAgile Member Organization

Course Outline

Section I: Overview of Data Science

1. Data Science as a quantitative discipline

How to define Data Science scopes
The many faces of Data Science: Data Mining, Data Analysis, Data Analytics, Machine Learning, Predictive Modeling, Statistical Learning, Mathematical Modeling. What are these all about?
Data Mining as a data exploration process
Machine Learning: supervised vs. unsupervised
Machine Learning vs. Predictive Analytics
Big Data Analytics: what is it and why it's important

2. Overview of a Data Mining process cycle

Understanding business needs and identifying new business opportunities
Formulating a business problem and associated requirements
Defining key quantitative metrics to measure success and evaluating business benefits
Translating business requirements into technical requirements and documentations
Formulating data models based on business and technical requirements
Identifying a set of quantitative models based on technical requirements and metrics of success
Running the models and evaluating results
Selecting the best model
Deploying the model

Section II: The Data Foundation

3. Data sources

4. Types of data

Structured vs. unstructured data
Static data vs. real-time data
Types of data attributes: numerical vs. categorical
Role of time factor and time trends in data analysis

5. Working with missing values

Main causes of missing data
Understanding the importance of missing information
Types of missing information
Restoring missing values
Imputing missing values and selecting imputation techniques
Understanding and evaluating potential consequences of manipulating records with missing values

6. Working with outliers

Defining quantitative criteria for outlier detection in 1D cases
Understanding role of outliers in model building
Deciding on outlier removal
Defining outlier detection metrics in multi-dimensional space

7. Working with duplicate records

Defining duplicates
Understanding sources of duplicates
Deciding on duplicate removal

Section III: Sampling and Hypothesis Testing

8. Why sampling may be important for Machine Learning

9. Sampling techniques and sample bias

10. Statistical hypothesis

11. Z-score, t-score and F statistic

12. P-values

13. Implementation of hypothesis testing for model evaluation analysis

Section IV: Machine Learning Fundamentals

14. What is Machine Learning?

15. Supervised vs. unsupervised learning

16. Overview of supervised Machine Learning

Regression models
Classification models

17. Overview of unsupervised Machine Learning

Clustering methods
Principal component analysis and dimension reduction
Association rules

18. Overview of major steps in building and testing quantitative models

Criteria for model selection
How to prepare a training set
Criteria for selecting model attributes / predictors
Working with collinear variables
Addressing imbalance problem
Dealing with over-fitting; bias-variance tradeoff
Validation and cross-validation

Section V: Building a Linear Regression Model with R.

19. Univariate regression vs. multiple regression

20. Mathematical foundation of linear regression overview: least square method vs. maximum likelihood method

21. Model assumptions

22. Working with continuous attributes

23. Dealing with collinear variable

24. Model subset selection:

Forward stepwise selection
Backward selection
Shrinkage methods: ridge regression and Lasso
Dimension reduction
Information criteria

25. Automating model selection procedure

26. Model parameter evaluation, R squared vs. adjusted R squared

27. Validating the model

28. Working with categorical variables

29. Considering input variable interactions

Section VI: Example of building a Classification Model with R

30. Dealing with imbalanced training sets

31. Understanding confusion matrix

32. Evaluating binary classifiers using ROC / AUC

Section VII: Example of Cluster Analysis with R

33. Overview of cluster analysis mathematical foundation

34. K-means clustering method

Algorithm overview
Convergence criteria
How to determine the number of clusters

Section VIII: Dimension Reduction techniques with R

35. What is dimension reduction?

36. The practical goals of dimension reduction implementation

37. Principal component analysis vs. singular value decomposition

38. How many components to choose

Section IX: Class Conclusion

39. What was not covered in the class

40. Big Data Analytics – the future of machine learning: main tools and concepts

Sorry! It looks like we haven’t updated our dates for the class you selected yet. There’s a quick way to find out. Contact us at 502.265.3057 or email info@training4it.com

Request a Date