Project: Statistical Learning With R on the NYPD dataset

R   Machine Learning   Analytics

Statistical Learning With R on the NYPD dataset

This is a project I completed for my Data Mining Algorithms course in Dec 2020.

Project objectives:

  • Obtain and pre-process the dataset
  • Understand the dataset thru visualization and summary statistics
  • Explain outliers, collinearities and high leverage points (if any)
  • Partition the dataset
  • Narrow my query of interest
  • Use ETL techniques to present data to algorithms in the correct format
  • Apply Statistical Learning Algorithms:
    • Decision Trees
    • Boosted Trees
    • Naïve Bayes
    • K-Means Clustering
  • Understand and summarize findings, present lessons learned

Video Presentation

Part 1

Part 2