Predicción de fracasos en implantes dentales mediante la integración de múltiples clasificadores

Contenido principal del artículo

Nancy B. Ganz
Alicia E. Ares
Horacio D. Kuna

Resumen

El campo de la ciencia de datos ha tenido muchos avances respecto a la aplicación y desarrollo de técnicas en el sector de la salud. Estos avances se ven reflejados en la predicción de enfermedades, clasificación de imágenes, identificación y reducción de riesgos, así como muchos otros. Este trabajo tiene por objetivo investigar el beneficio de la utilización de múltiples algoritmos de clasificación, para la predicción de fracasos en implantes dentales de la provincia de Misiones, Argentina y proponer un procedimiento validado por expertos humanos. El modelo abarca la combinación de los clasificadores: Random Forest, C-Support Vector, K-Nearest Neighbors, Multinomial Naive Bayes y Multi-layer Perceptron. La integración de los modelos se realiza con el weighted soft voting method. La experimentación es realizada con cuatro conjuntos de datos, un conjunto de implantes dentales confeccionado para el estudio de caso, un conjunto generado artificialmente y otros dos conjuntos obtenidos de distintos repositorios de datos. Los resultados arrojados del enfoque propuesto sobre el conjunto de datos de implantes dentales, es validado con el desempeño en la clasificación por expertos humanos. Nuestro enfoque logra un porcentaje de acierto del 93% de casos correctamente identificados, mientras que los expertos humanos consiguen un 87% de precisión.

Descargas

Los datos de descargas todavía no están disponibles.

Detalles del artículo

Cómo citar
Ganz, N. B., Ares, A. E., & Kuna, H. D. (2020). Predicción de fracasos en implantes dentales mediante la integración de múltiples clasificadores. Revista De Ciencia Y Tecnología, 34(1), 13–23. https://doi.org/10.36995/j.recyt.2020.34.002
Sección
Ingeniería, Tecnología e Informática
Biografía del autor/a

Nancy B. Ganz, Instituto de Materiales de Misiones (IMAM), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Facultad de Ciencias Exactas Químicas y Naturales (FCEQyN), Universidad Nacional de Misiones (UNaM). Argentina.

Félix de Azara 1552, N3300LQH, Posadas, Misiones, Argentina.

Alicia E. Ares, Instituto de Materiales de Misiones (IMAM), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Facultad de Ciencias Exactas Químicas y Naturales (FCEQyN), Universidad Nacional de Misiones (UNaM). Argentina.

Félix de Azara 1552, N3300LQH, Posadas, Misiones, Argentina.

Horacio D. Kuna, Instituto de Investigación, Desarrollo e Innovación en Informática (IIDII), Facultad de Ciencias Exactas Químicas y Naturales (FCEQyN), Universidad Nacional de Misiones (UNaM). Argentina.

Félix de Azara 1552, N3300LQH, Posadas, Misiones, Argentina.

Citas

Abbas, M., Ali Memon, K., Aleem Jamali, A., Memon, S., & Ahmed, A. (2019). Multinomial Naive Bayes Classification Model for Sentiment Analysis. IJCSNS International Journal of Computer Science and Network Security, 19(3), 62–67. https://doi.org/10.13140/RG.2.2.30021.40169

Alshalabi, H., Tiun, S., Omar, N., & Albared, M. (2013). Experiments on the Use of Feature Selection and Machine Learning Methods in Automatic Malay Text Categorization. Procedia Technology, 11, 748–754. https://doi.org/10.1016/j.protcy.2013.12.254

Altman, N. S. (1992). An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression. The American Statistician, 46(3), 175–185. https://doi.org/10.1080/00031305.1992.10475879

Bennasar, M., Hicks, Y., & Setchi, R. (2015). Feature selection using Joint Mutual Information Maximisation. Expert Systems with Applications, 42(22), 8520–8532. https://doi.org/10.1016/j.eswa.2015.07.007

Bhattacharjee, K., & Pant, M. (2019). Hybrid Particle Swarm Optimization-Genetic Algorithm trained Multi-Layer Perceptron for Classification of Human Glioma from Molecular Brain Neoplasia Data. Cognitive Systems Research, 58, 173–194. https://doi.org/10.1016/j.cogsys.2019.06.003

Biau, G., & Scornet, E. (2016). A random forest guided tour. Test, 25(2), 197–227. https://doi.org/10.1007/s11749-016-0481-7

Breiman, L. (2001). Random Forest. Machine Learning, 45(1), 5–32. https://doi.org/10.1017/CBO9781107415324.004

Cao, D. S., Huang, J. H., Liang, Y. Z., Xu, Q. S., & Zhang, L. X. (2012). Tree-based ensemble methods and their applications in analytical chemistry. Trends in Analytical Chemistry, 40(2), 158–167. https://doi.org/10.1016/j.trac.2012.07.012

Catal, C., & Nangir, M. (2017). A sentiment classification model based on multiple classifiers. Applied Soft Computing Journal, 50, 135–141. https://doi.org/10.1016/j.asoc.2016.11.022

Chaki, J., Dey, N., Moraru, L., & Shi, F. (2019). Fragmented plant leaf recognition: Bag-of-features, fuzzy-color and edge-texture histogram descriptors with multi-layer perceptron. Optik, 181(December 2018), 639–650. https://doi.org/10.1016/j.ijleo.2018.12.107

Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers and Electrical Engineering, 40(1), 16–28. https://doi.org/10.1016/j.compeleceng.2013.11.024

Chang, C., & Lin, C. (2011). LIBSVM : A Library for Support Vector Machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), 1–39. https://doi.org/10.1145/1961189.1961199

Chen, W., Li, H., Hou, E., Wang, S., Wang, G., Panahi, M., … Ahmad, B. Bin. (2018). GIS-based groundwater potential analysis using novel ensemble weights-of-evidence with logistic regression and functional tree models. Science of the Total Environment, 634, 853–867. https://doi.org/10.1016/j.scitotenv.2018.04.055

Chen, W., Peng, J., Hong, H., Shahabi, H., Pradhan, B., Liu, J., … Duan, Z. (2018). Landslide susceptibility modelling using GIS-based machine learning techniques for Chongren County, Jiangxi Province, China. Science of the Total Environment, 626, 1121–1135. https://doi.org/10.1016/j.scitotenv.2018.01.124

Chong, D., Zhu, N., Luo, W., & Pan, X. (2019). Human thermal risk prediction in indoor hyperthermal environments based on random forest. Sustainable Cities and Society, 49(April), 101595. https://doi.org/10.1016/j.scs.2019.101595

Chorowski, J., Wang, J., & Zurada, J. M. (2014). Review and performance comparison of SVM- and ELM-based classifiers. Neurocomputing, 128, 507–516. https://doi.org/10.1016/j.neucom.2013.08.009

Cover T, M., & Hart P, E. (1967). Nearest Neighbor Pattern Classification. IEEE Transactions on Information Theory, 13(1), 21–27.

de Santana, F. B., Borges Neto, W., & Poppi, R. J. (2019). Random forest as one-class classifier and infrared spectroscopy for food adulteration detection. Food Chemistry, 293(July 2018), 323–332. https://doi.org/10.1016/j.foodchem.2019.04.073

Douzas, G., Bacao, F., & Last, F. (2018). Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Information Sciences, 465, 1–20. https://doi.org/10.1016/j.ins.2018.06.056

Eeti, L. N., & Buddhiraju, K. M. (2016). A modified class-specific weighted soft voting for bagging ensemble. International Geoscience and Remote Sensing Symposium (IGARSS), November, 2622–2625. https://doi.org/10.1109/IGARSS.2016.7729677

Elreedy, D., & Atiya, A. F. (2019). A Comprehensive Analysis of Synthetic Minority Oversampling TEchnique (SMOTE) for Handling Class Imbalance. Information Sciences, 505, 32–64. https://doi.org/10.1016/j.ins.2019.07.070

Fan, Xinghua, Wang, L., & Li, S. (2016). Predicting chaotic coal prices using a multi-layer perceptron network model. Resources Policy, 50, 86–92. https://doi.org/10.1016/j.resourpol.2016.08.009

Fan, Xue, & Shin, H. (2016). Road vanishing point detection using weber adaptive local filter and salient-block-wise weighted soft voting. IET Computer Vision, 10(6), 503–512. https://doi.org/10.1049/iet-cvi.2015.0313

Fierrez, J., Morales, A., Vera-Rodriguez, R., & Camacho, D. (2018). Multiple classifiers in biometrics. part 1: Fundamentals and review. Information Fusion, 44(December 2017), 57–64. https://doi.org/10.1016/j.inffus.2017.12.003

Gao, M., Hong, X., Chen, S., & Harris, C. J. (2011). A combined SMOTE and PSO based RBF classifier for two-class imbalanced problems. Neurocomputing, 74(17), 3456–3466. https://doi.org/10.1016/j.neucom.2011.06.010

Gholami, V., Chau, K. W., Fadaee, F., Torkaman, J., & Ghaffari, A. (2015). Modeling of groundwater level fluctuations using dendrochronology in alluvial aquifers. Journal of Hydrology, 529(March 2019), 1060–1069. https://doi.org/10.1016/j.jhydrol.2015.09.028

Guyon, I., & Elisseeff, A. (2003). An Introduction to Variable and Feature Selection. Journal of Machine Learning Research, 3, 1157–1182. https://doi.org/10.1063/1.106515

He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284. https://doi.org/10.1109/TKDE.2008.239

Heo, H., Park, H., Kim, N., & Lee, J. (2009). Prediction of credit delinquents using locally transductive multi-layer perceptron. Neurocomputing, 73(1–3), 169–175. https://doi.org/10.1016/j.neucom.2009.02.025

Irie, B., & Sei Miyake. (1988). Capabilities of Three-layered Perceptrons. IEEE Nternational Conference on Neural Networks, 641–648. https://doi.org/10.1109/ICNN.1988.23901

Isabelle, G., Maharani, W., & Asror, I. (2019). Analysis on Opinion Mining Using Combining Lexicon-Based Method and Multinomial Naïve Bayes. 2018 International Conference on Industrial Enterprise and System Engineering (ICoIESE 2018), 2(IcoIESE 2018), 214–219. https://doi.org/10.2991/icoiese-18.2019.38

Jin, X., Xu, A., Bie, R., & Guo, P. (2006). Machine learning techniques and chi-square feature selection for cancer classification using SAGE gene expression profiles. Data Mining for Biomedical Applications, 3916, 106–115. https://doi.org/10.1007/11691730_11

Khaire, U. M., & Dhanalakshmi, R. (2019). Stability of feature selection algorithm: A review. Journal of King Saud University - Computer and Information Sciences. https://doi.org/10.1016/j.jksuci.2019.06.012

Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97(1–2), 273–324. https://doi.org/https://doi.org/10.1016/S0004-3702(97)00043-X

Kong, Y. S., Abdullah, S., Schramm, D., Omar, M. Z., & Haris, S. M. (2019). Optimization of spring fatigue life prediction model for vehicle ride using hybrid multi-layer perceptron artificial neural networks. Mechanical Systems and Signal Processing, 122, 597–621. https://doi.org/10.1016/j.ymssp.2018.12.046

Kuncheva, L. I. (2014). Combining Pattern Classifiers: Methods and Algorithms. In Combining Pattern Classifiers (2nd ed., pp. 290–325). John Wiley & Sons. https://doi.org/10.1002/9781118914564.ch9

Liu, J., & Zio, E. (2019). Integration of feature vector selection and support vector machine for classification of imbalanced data. Applied Soft Computing Journal, 75, 702–711. https://doi.org/10.1016/j.asoc.2018.11.045

Lu, Y. (1996). Knowledge integration in a multiple classifier system. Applied Intelligence, 6(2), 75–86. https://doi.org/10.1007/BF00117809

Manning, C. D., Raghavan, P., & Schutze, H. (2009). Text classification and Naive Bayes. In Introduction to Information Retrieval (pp. 253–287). Cambridge University Press. https://doi.org/10.1017/cbo9780511809071.014

Miao, Y., Jiang, H., Liu, H., & Yao, Y. dong. (2017). An Alzheimers disease related genes identification method based on multiple classifier integration. Computer Methods and Programs in Biomedicine, 150, 107–115. https://doi.org/10.1016/j.cmpb.2017.08.006

Mielniczuk, J., & Teisseyre, P. (2019). Stopping rules for mutual information-based feature selection. Neurocomputing, 358, 255–274. https://doi.org/10.1016/j.neucom.2019.05.048

Mohandes, M., Deriche, M., & Aliyu, S. O. (2018). Classifiers Combination Techniques: A Comprehensive Review. IEEE Access, 6, 19626–19639. https://doi.org/10.1109/ACCESS.2018.2813079

Moran, M., & Gordon, G. (2019). Curious Feature Selection. Information Sciences, 485, 42–54. https://doi.org/10.1016/j.ins.2019.02.009

N. V. Chawla, Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953

Naeem, S., Shahhosseini, S., & Ghaemi, A. (2016). Simulation of CO2 capture using sodium hydroxide solid sorbent in a fluidized bed reactor by a multi-layer perceptron neural network. Journal of Natural Gas Science and Engineering, 31, 305–312. https://doi.org/10.1016/j.jngse.2016.03.028

Novakovic, J., & Veljovic, A. (2011). C-support vector classification: Selection of kernel and parameters in medical diagnosis. IEEE 9th International Symposium on Intelligent Systems and Informatics, 465–470. https://doi.org/10.1109/SISY.2011.6034373

Nweke, H. F., Teh, Y. W., Mujtaba, G., & Al-garadi, M. A. (2019). Data fusion and multiple classifier systems for human activity detection and health monitoring: Review and open research directions. Information Fusion, 46(June 2018), 147–170. https://doi.org/10.1016/j.inffus.2018.06.002

Oliveira, L., Nunes, U., & Peixoto, P. (2010). On Exploration of Classifier Ensemble Synergism in Pedestrian Detection. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 11(1), 16–27.

Pan, Y., Gao, H., Lin, H., Liu, Z., Tang, L., & Li, S. (2018). Identification of bacteriophage virion proteins using multinomial Naïve bayes with g-gap feature tree. International Journal of Molecular Sciences, 19(6). https://doi.org/10.3390/ijms19061779

Pandey, M., & Taruna, S. (2016). Towards the integration of multiple classifier pertaining to the Student’s performance prediction. Perspectives in Science, 8, 364–366. https://doi.org/10.1016/j.pisc.2016.04.076

Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 50(302), 157–175. https://doi.org/10.1080/14786440009463897

Pham, B. T., Nguyen, M. D., Bui, K. T. T., Prakash, I., Chapi, K., & Bui, D. T. (2019). A novel artificial intelligence approach based on Multi-layer Perceptron Neural Network and Biogeography-based Optimization for predicting coefficient of consolidation of soil. Catena, 173(September 2018), 302–311. https://doi.org/10.1016/j.catena.2018.10.004

Quan, Y., Xu, Y., Sun, Y., & Huang, Y. (2016). Supervised dictionary learning with multiple classifier integration. Pattern Recognition, 55, 247–260. https://doi.org/10.1016/j.patcog.2016.01.028

Quinlan, J. R. (1986). Induction of Decision Trees. Machine Learning, 1(1), 81–106. https://doi.org/10.1023/A:1022643204877

Rahman, M. M., Desai, B. C., & Bhattacharya, P. (2008). Medical image retrieval with probabilistic multi-class support vector machine classifiers and adaptive similarity fusion. Computerized Medical Imaging and Graphics, 32(2), 95–108. https://doi.org/10.1016/j.compmedimag.2007.10.001

Richhariya, B., & Tanveer, M. (2018). EEG signal classification using universum support vector machine. Expert Systems with Applications, 106, 169–182. https://doi.org/10.1016/j.eswa.2018.03.053

Ruano-Ordás, D., Yevseyeva, I., Fernandes, V. B., Méndez, J. R., & Emmerich, M. T. M. (2019). Improving the drug discovery process by using multiple classifier systems. Expert Systems with Applications, 121, 292–303. https://doi.org/10.1016/j.eswa.2018.12.032

sciki-learn. (2019). Tuning the hyper-parameters of an estimator. Retrieved March 5, 2020, from https://scikit-learn.org/stable/modules/grid_search.html#grid-search

scikit-learn. (2019). scikit-learn: Machine Learning in Python. Retrieved July 4, 2019, from https://scikit-learn.org/stable/

Shannon, C. E. (1948). A Mathematical Theory of Communication. The Bell System Technical Journal, 27(3), 379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x

Shreem, S. S., Abdullah, S., & Nazri, M. Z. A. (2016). Hybrid feature selection algorithm using symmetrical uncertainty and a harmony search algorithm. International Journal of Systems Science, 47(6), 1312–1329. https://doi.org/10.1080/00207721.2014.924600

Singh, G., Kumar, B., Gaur, L., & Tyagi, A. (2019). Comparison between Multinomial and Bernoulli Naïve Bayes for Text Classification. 2019 International Conference on Automation, Computational and Technology Management (ICACTM), 593–596.

Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing and Management, 45(4), 427–437. https://doi.org/10.1016/j.ipm.2009.03.002

Sumaiya Thaseen, I., & Aswani Kumar, C. (2017). Intrusion detection model using fusion of chi-square feature selection and multi class SVM. Journal of King Saud University - Computer and Information Sciences, 29(4), 462–472. https://doi.org/10.1016/j.jksuci.2015.12.004

Sun, J., Li, H., Fujita, H., Fu, B., & Ai, W. (2019). Class-imbalanced dynamic financial distress prediction based on Adaboost-SVM ensemble combined with SMOTE and time weighting. Information Fusion, 54(July 2019), 128–144. https://doi.org/10.1016/j.inffus.2019.07.006

Susan, S., & Kumar, A. (2019). SSO Maj -SMOTE-SSO Min : Three-step intelligent pruning of majority and minority samples for learning from imbalanced datasets. Applied Soft Computing Journal, 78, 141–149. https://doi.org/10.1016/j.asoc.2019.02.028

Susmaga, R. (2004). Confusion Matrix Visualization. In Intelligent Information Processing and Web Mining (pp. 107–116). Berlin, Heidelberg: Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-540-39985-8_12

Verikas, A., Gelzinis, A., & Bacauskiene, M. (2011). Mining data with random forests: A survey and results of new tests. Pattern Recognition, 44(2), 330–349. https://doi.org/10.1016/j.patcog.2010.08.011

Xu, S. (2018). Bayesian Naïve Bayes classifiers to text classification. Journal of Information Science, 44(1), 48–59. https://doi.org/10.1177/0165551516677946

Yan, J., Bracewell, D. B., Ren, F., & Kuroiwa, S. (2009). Integration of Multiple Classifiers for Chinese Semantic Dependency Analysis. Electronic Notes in Theoretical Computer Science, 225(C), 457–468. https://doi.org/10.1016/j.entcs.2008.12.092

Yang, Y., & Pedersen, J. O. (1997). A Comparative Study on Feature Selection in Text Categorization. Proceedings of the 14th International Conference on Machine Learning, 412–420. https://doi.org/10.1093/bioinformatics/bth267

Zarei, T., & Behyad, R. (2019). Predicting the water production of a solar seawater greenhouse desalination unit using multi-layer perceptron model. Solar Energy, 177(October 2018), 595–603. https://doi.org/10.1016/j.solener.2018.11.059

Zhang, R., Nie, F., Li, X., & Wei, X. (2019). Feature selection with multi-view data: A survey. Information Fusion, 50(May 2018), 158–167. https://doi.org/10.1016/j.inffus.2018.11.019