Project: Statistical Learning With R on the NYPD dataset
Statistical Learning With R on the NYPD dataset
This is a project I completed for my Data Mining Algorithms course in Dec 2020.
Project objectives:
- Obtain and pre-process the dataset
- Understand the dataset thru visualization and summary statistics
- Explain outliers, collinearities and high leverage points (if any)
- Partition the dataset
- Narrow my query of interest
- Use ETL techniques to present data to algorithms in the correct format
- Apply Statistical Learning Algorithms:
- Decision Trees
- Boosted Trees
- Naïve Bayes
- K-Means Clustering
- Understand and summarize findings, present lessons learned
Video Presentation
Part 1
Part 2