Link

Diabetes Prediction Analytics

The final project for Linear Regression (STA 108) with Prof. Chen at the University of California, Davis. The course involved utilizing a set of data related to diabetes patients and identifying the different variables which best predicted that a person in the population would have diabetes. Doing this meant performing a transformation on an indicator variable with BoxCox and then regressing it on all the other variables. The data was then split into training and validation data for cross-validation. Techniques that were used included “best” subset, AIC, BIC, Cp, R squared, Adjusted R squared, SSE, and the forward stepwise procedure. Multiple models were created to test different prediction criteria, and in the end, a model was selected according to it having the lowest MSPR value. Most of the work was performed in RStudio, and a report of the data along with commentary was done using Microsoft Word.

Link to project

Table of models. Table with best models for each criterion.
Plots from the project include (1) Table of models, (2) Table with best models for each criterion.