A Parallel Support Vector Machine algorithm to identify patterns of pollution in Smart Cities for Metropolitan Zone of Guadalajara

Authors

  • Martha Patricia Martínez Vargas Universidad de Guadalajara
  • Elsa Estrada Guzmán Universidad de Guadalajara – CUCEI – Departamento de Ciencias Computacionales https://orcid.org/0000-0003-2009-9661
  • Roció Maciel Arellano Universidad de Guadalajara – CUCEA – Centro de Innovación en Ciudades Inteligentes

DOI:

https://doi.org/10.32870/recibe.v10i2.162

Keywords:

Data mining, Support vector machines, parallel support vector machine libraries, Internet of Things, Smart Cities.

Abstract

Pollution in dense populations such as the Metropolitan Area of Guadalajara grows exponentially, affecting the health of citizens and reducing their quality of life. One of the main research challenges pursued with the study of Smart Cities is environmental pollution to improve the well-being of citizens and to protect natural areas. Therefore, it is urgent the development of information technologies that allow to reduce this problem by the scientific analysis of data, to classify the zones with greater contamination. Currently, these data are captured by constant monitoring stations, generating a large volume of information representing a challenge for the classification process. This work proposes a model for the automatic execution of a classification algorithm using Support Vector Machine implementing libraries in python for parallel processing. As a result, we obtain two main subsystems: one of the Parameters of Configuration for the storage and cleaning of data and the design of the algorithm parallelized in the cloud importing modules mpi4py, numpy and sklearn.svm.

Author Biographies

Martha Patricia Martínez Vargas, Universidad de Guadalajara

Dr. Martha Patricia Martínez Vargas obtained her PhD in Information Technology in 2015 from the University of Guadalajara. Currently, he is a full-time lecturer in the Systems Department and a member of the UDG-CA-931 Academic team of the University Center for Economic-Administrative Sciences. His area of research interest is data analysis. She has directed various theses of the Master in Information Technology and the Bachelor of Information Technology. As well as, participating as co-author of various publications in the area of Technologies. He has taught the subject of Management Information Systems at ITESO in two semesters and tutor of the thesis of the master's degree in applied computing.

Elsa Estrada Guzmán, Universidad de Guadalajara – CUCEI – Departamento de Ciencias Computacionales

Obtained a doctorate in Information Technology in 2018. Currently, she teaches courses in the Master of Information Systems at the University of Guadalajara. Her main lines of research are Data Analysis using Machine Learning, on Smart Cities issues, as well as on Software Engineering for the development of applications for event monitoring and decision making.

Roció Maciel Arellano, Universidad de Guadalajara – CUCEA – Centro de Innovación en Ciudades Inteligentes

Is a Research Professor at the Department of Information Systems of the CUCEA University of Guadalajara (UDG). She works as a Researcher and coordinator of Special Projects of the Center for Innovation in Smart Cities of the UDG. Among his areas of research interest from the perspective of Smart Cities are Smart People strategies in virtual or online education, the application of technology for the inclusion of people with different abilities and the development of user experience on technological platforms. Likewise, she has organized different congresses, diplomas, workshops and has collaborated in the design of undergraduate and postgraduate educational programs oriented to Information Technologies. Additionally, she has indexed scientific publications, patents, and has given lectures and participated in national and international panels.

References

References

Towsend, A. M., (2003). Smart Cities, First. New York, pp. 111–114.

Dario, C. G. M., (2015), “La Ocde y El Inegi Presentan Los Resultados más Destacados y el Sitio Web del Proyecto: Midiendo El Bienestar en Las Entidades Federativas,” pp. 1–2. [3] “OMS | Calidad del aire ambiente (exterior) y salud,” WHO, (2016). [4] “The Urban Internet of Things | Data-Smart City Solutions.” (2015) Retrieved from http://datasmart.ash.harvard.edu/news/article/the-urbaninternet-of-things-727. [5] Mora, O. B., & Larios, V. M., (2015) “Urban Operating System For Sensor Networks Management in Smart Cities,” pp. 1–4. [6] “Monitoring Air Quality and the Impacts of Pollution | Data-Smart City Solutions.” (2015) Retrieved from http://datasmart.ash.harvard.edu/news/article/monitoring-air-quality-andthe-impacts-of-pollution-679. [7] Snyder, E. G., Watkins, T. H., Solomon, P. A., Thoma, E. D., Williams, R. W. Hagler, … Preuss, P. W., (2013), “The Changing Paradigm of Air Pollution Monitoring,” Environ. Sci. Technol., pp. 11369-11377. [8] Programa para Mejorar la Calidad del Aire Jalisco (2011-2020), Secretaría de Medio Ambiente para el Desarrollo Sustentable of the Jalisco State. [9] Mendoza A., & García, M. (2018). "Aplicación de un modelo de calidad del aire de segunda generación a la Zona Metropolitana de Guadalajara, México", pp. 1-13. [10] Ramírez-Sánchez, H. U., Andrade-García, M. D., González-Castañeda, M. E., & Celis, A. J. (2006). Contaminantes atmosféricos y su correlación con infecciones agudas de las vías respiratorias en niños de Guadalajara, Jalisco, Salud Pública Méx; Vol. 48(5):385-394. [11] Coenen, F., (2011). “Data mining: past, present and future,” The Knowledge Engineering Review, Vol. 26:1, 25–29. & Cambridge University Press, doi:10.1017/S0269888910000378

Aggarwal, C. C., Ashish, N., & Sheth, A., (2013). “The Internet of Things: A Survey from the Data-Centric Perspective”. In C. C. Aggarwal (Ed.), Managing and Mining Sensor Data, pp. 383–428. Boston, MA: Springer US. [13] Patulea, C., Peace, R. & Green, J. (2010). “CUDA-accelerated genetic feedforward-ANN training for data mining,” J. Phys.: Conf. Ser., vol. 256, pp. 012014–9. [14] Pebesma, E., Bivand, R., & P. J., n.d., Ribeiro, “cudaBayesreg: Parallel Implementation of a Bayesian Multilevel Model for fMRI Data Analysis,”. [15] Gullo, F. (2015). “From Patterns in Data to Knowledge Discovery: What Data Mining Can Do,” Physics Procedia, vol. 62, pp. 18–22. [16] Carraher, L. A., Wilsey, P. A., & Annexstein, F. S. (2013). “A GPGPU Algorithm for c-Approximate r-Nearest Neighbor Search in High Dimensions,” presented at the 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and PhD Forum (IPDPSW), pp. 2079–2088. [17] Tan, K. Zhang, J., Du, Q., & Wang, X. (2016). “GPU Parallel Implementation of Support Vector Machines for Hyperspectral Image Classification,” IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing, vol. 8, no. 10, pp. 4647–4656. [18] Guo, W., Alham, N. K. Liu, Y., Li, M., & Qi, M. (2015). “A Resource Aware MapReduce Based Parallel SVM for Large Scale Image Classifications,” Neural Processing Letters, vol. 44, no. 1, pp. 161–184. [19] Vapnik, V. (1982). Estimation of Dependences Based on Empirical Data, Springer-Verlag. [20] Catanzaro, B., Catanzaro, B., Keutzer, K., & Keutzer, K. (2008). Fast Support Vector Machine Training and Classication on Graphics Processors. Machine Learning, 104–111. [21] Platt, J. C. (1998). Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. Advances in Kernel Methods, 185– 208. [22] Dogaru, R., & Dogaru, I. (2015). “A Low-Cost High-Performance Computing Platform for Cellular Nonlinear Networks using Python for CUDA”. 20th International Conference on Control Systems and Science, 593–598. doi: 10.1109/CSCS.2015.36. [23] CUDA a parallel computing platform definition. (2020). Retrieved from https://blogs.nvidia.com/blog/2012/09/10/what-is-cuda-2/ [24] GPU Accelerated Computing with Python. (2020). Retrieved from https://developer.nvidia.com/how-to-cuda-python. [25] NumPy Python Library Definition. (2020). Retrieved from https://docs.scipy.org/doc/numpy/user/whatisnumpy.html [26] SciPy Python Library Definition. (2020). Retrieved from http://scipy.org/scipylib/index.html.

Downloads

Published

2022-02-23

How to Cite

Martínez Vargas M. P., Estrada Guzmán, E., & Maciel Arellano, R. . (2022). A Parallel Support Vector Machine algorithm to identify patterns of pollution in Smart Cities for Metropolitan Zone of Guadalajara. ReCIBE, Electronic Journal of Computing, Informatics, Biomedical and Electronics, 10(2), C4–17. https://doi.org/10.32870/recibe.v10i2.162

Issue

Section

Computer Science & IT