A Parallel Support Vector Machine algorithm to identify patterns of pollution in Smart Cities for Metropolitan Zone of Guadalajara
DOI:
https://doi.org/10.32870/recibe.v10i2.162Keywords:
Data mining, Support vector machines, parallel support vector machine libraries, Internet of Things, Smart Cities.Abstract
Pollution in dense populations such as the Metropolitan Area of Guadalajara grows exponentially, affecting the health of citizens and reducing their quality of life. One of the main research challenges pursued with the study of Smart Cities is environmental pollution to improve the well-being of citizens and to protect natural areas. Therefore, it is urgent the development of information technologies that allow to reduce this problem by the scientific analysis of data, to classify the zones with greater contamination. Currently, these data are captured by constant monitoring stations, generating a large volume of information representing a challenge for the classification process. This work proposes a model for the automatic execution of a classification algorithm using Support Vector Machine implementing libraries in python for parallel processing. As a result, we obtain two main subsystems: one of the Parameters of Configuration for the storage and cleaning of data and the design of the algorithm parallelized in the cloud importing modules mpi4py, numpy and sklearn.svm.References
References
Towsend, A. M., (2003). Smart Cities, First. New York, pp. 111–114.
Dario, C. G. M., (2015), “La Ocde y El Inegi Presentan Los Resultados más Destacados y el Sitio Web del Proyecto: Midiendo El Bienestar en Las Entidades Federativas,” pp. 1–2. [3] “OMS | Calidad del aire ambiente (exterior) y salud,” WHO, (2016). [4] “The Urban Internet of Things | Data-Smart City Solutions.” (2015) Retrieved from http://datasmart.ash.harvard.edu/news/article/the-urbaninternet-of-things-727. [5] Mora, O. B., & Larios, V. M., (2015) “Urban Operating System For Sensor Networks Management in Smart Cities,” pp. 1–4. [6] “Monitoring Air Quality and the Impacts of Pollution | Data-Smart City Solutions.” (2015) Retrieved from http://datasmart.ash.harvard.edu/news/article/monitoring-air-quality-andthe-impacts-of-pollution-679. [7] Snyder, E. G., Watkins, T. H., Solomon, P. A., Thoma, E. D., Williams, R. W. Hagler, … Preuss, P. W., (2013), “The Changing Paradigm of Air Pollution Monitoring,” Environ. Sci. Technol., pp. 11369-11377. [8] Programa para Mejorar la Calidad del Aire Jalisco (2011-2020), Secretaría de Medio Ambiente para el Desarrollo Sustentable of the Jalisco State. [9] Mendoza A., & García, M. (2018). "Aplicación de un modelo de calidad del aire de segunda generación a la Zona Metropolitana de Guadalajara, México", pp. 1-13. [10] Ramírez-Sánchez, H. U., Andrade-García, M. D., González-Castañeda, M. E., & Celis, A. J. (2006). Contaminantes atmosféricos y su correlación con infecciones agudas de las vías respiratorias en niños de Guadalajara, Jalisco, Salud Pública Méx; Vol. 48(5):385-394. [11] Coenen, F., (2011). “Data mining: past, present and future,” The Knowledge Engineering Review, Vol. 26:1, 25–29. & Cambridge University Press, doi:10.1017/S0269888910000378
Aggarwal, C. C., Ashish, N., & Sheth, A., (2013). “The Internet of Things: A Survey from the Data-Centric Perspective”. In C. C. Aggarwal (Ed.), Managing and Mining Sensor Data, pp. 383–428. Boston, MA: Springer US. [13] Patulea, C., Peace, R. & Green, J. (2010). “CUDA-accelerated genetic feedforward-ANN training for data mining,” J. Phys.: Conf. Ser., vol. 256, pp. 012014–9. [14] Pebesma, E., Bivand, R., & P. J., n.d., Ribeiro, “cudaBayesreg: Parallel Implementation of a Bayesian Multilevel Model for fMRI Data Analysis,”. [15] Gullo, F. (2015). “From Patterns in Data to Knowledge Discovery: What Data Mining Can Do,” Physics Procedia, vol. 62, pp. 18–22. [16] Carraher, L. A., Wilsey, P. A., & Annexstein, F. S. (2013). “A GPGPU Algorithm for c-Approximate r-Nearest Neighbor Search in High Dimensions,” presented at the 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and PhD Forum (IPDPSW), pp. 2079–2088. [17] Tan, K. Zhang, J., Du, Q., & Wang, X. (2016). “GPU Parallel Implementation of Support Vector Machines for Hyperspectral Image Classification,” IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing, vol. 8, no. 10, pp. 4647–4656. [18] Guo, W., Alham, N. K. Liu, Y., Li, M., & Qi, M. (2015). “A Resource Aware MapReduce Based Parallel SVM for Large Scale Image Classifications,” Neural Processing Letters, vol. 44, no. 1, pp. 161–184. [19] Vapnik, V. (1982). Estimation of Dependences Based on Empirical Data, Springer-Verlag. [20] Catanzaro, B., Catanzaro, B., Keutzer, K., & Keutzer, K. (2008). Fast Support Vector Machine Training and Classication on Graphics Processors. Machine Learning, 104–111. [21] Platt, J. C. (1998). Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. Advances in Kernel Methods, 185– 208. [22] Dogaru, R., & Dogaru, I. (2015). “A Low-Cost High-Performance Computing Platform for Cellular Nonlinear Networks using Python for CUDA”. 20th International Conference on Control Systems and Science, 593–598. doi: 10.1109/CSCS.2015.36. [23] CUDA a parallel computing platform definition. (2020). Retrieved from https://blogs.nvidia.com/blog/2012/09/10/what-is-cuda-2/ [24] GPU Accelerated Computing with Python. (2020). Retrieved from https://developer.nvidia.com/how-to-cuda-python. [25] NumPy Python Library Definition. (2020). Retrieved from https://docs.scipy.org/doc/numpy/user/whatisnumpy.html [26] SciPy Python Library Definition. (2020). Retrieved from http://scipy.org/scipylib/index.html.