Diabetes Prediction Analytics
The final project for Linear Regression (STA 108) with Prof. Chen at the University of California, Davis. The course involved utilizing a set of data related to diabetes patients and identifying the different variables which best predicted that a person in the population would have diabetes. Doing this meant performing a transformation on an indicator variable with BoxCox and then regressing it on all the other variables. The data was then split into training and validation data for cross-validation. Techniques that were used included “best” subset, AIC, BIC, Cp, R squared, Adjusted R squared, SSE, and the forward stepwise procedure. Multiple models were created to test different prediction criteria, and in the end, a model was selected according to it having the lowest MSPR value. Most of the work was performed in RStudio, and a report of the data along with commentary was done using Microsoft Word.