Abstract:
The research problem lies in predicting diabetes and using data mining to predict type 1 and type 2 diabetes. Data mining and analysis has become a widespread study in recent times and it can be applied to various fields where this method extracts unspecified data elements. The researcher is studying the possibility of using data mining to predict diabetes of the first and second types, and determining the appropriate method for predicting diabetes using the descriptive and analytical approach by mining the data. There are models used in the prediction process in general. We will choose from them the decision tree and the linear regression and make a comparison between them. In accuracy, precision, Recall and F measure using Rapid Miner. The researcher used the data (Pima Indians diabetics) that contain 769 records and 9 characteristics.
When executing the linear regression algorithm inside the Rapidminer، we get a
(accuracy = 76.09%)، (precision = 79.14%)، (Recall = 86.00%) and (F measure = 82.43%) and upon implementing the decision tree we got (accuracy = 70.87%)، (precision = 71.28%)، ( Recall = 92.67%) and (F measure = 80.58%). By comparing the results we obtained، we find that linear regression is better than the decision tree in predicting the type of diabetes.
Keywords: data mining، rapidminer، decision tree، linear regression