A Neighbourhood Rough Set Based Clustering Algorithm and its Applications

Authors

  • Balakrushna Tripathy School of information technology and Engineering, VIT University, Vellore, India
  • Akarsh University of Southern California, USA

Keywords:

Epidemiology, Geospatial Data, Neighbourhood, Clustering, Rough Set

Abstract

The process of Data clustering puts similar objects into the same group. The attributes under consideration may be numerical or categorical. A type of partition attribute based clustering algorithm dealing with categorical attributes only through rough sets was started in 2007 with the Min-Min Roughness (MMR) algorithm due to Parmar et al. It was generalised to the Min Mean roughness (MMeR) algorithm dealing with heterogeneous attributes by Kumar et al in 2009. Here, the numeric attributes are transformed into categorical ones. It was further improved through the Min Standard Deviation Roughness (SDR) algorithm and the Standard deviation Standard Deviation Roughness (SSDR) algorithm by Tripathy et al in 2011.  A natural approach to deal with both types of attributes together, which extends all these algorithms is the Min Mean Neighbourhood Roughness (MMeNR) algorithm, which can be applied to uncertainty based heterogeneous attributes. This algorithm is shown to be superior to the above algorithms through the computation of F-measure and bench marked data sets like Teacher Assistant Evaluation Data Set and Acute Inflammations Data Set from UCI repository. Spatial datasets like the Forest Fire and the Abandoned Mine Land Inventory Data are used to show the superiority of MMeNR among all.

Downloads

Download data is not yet available.

References

Bai, H., Ge, Y., Wang, J.-F., & Lan Liao, Y. (2010). Using rough set theory to identify villages affected by birth defects: the example of Heshun, Shanxi, China. International Journal of Geographical Information Science, 24(4), 559–576. https://doi.org/10.1080/13658810902960079

Buchmann, A., Gfinther, O., Smith, T.R., & Wang, Y.E. (1989). Eds. Design and implementation of large spatial Databases, Proceedings of the First International Symposium on Large Spatial Databases, Santa Barbara. https://doi.org/10.1007/3-540-52208-5

Chaira, T., & Anand, S. (2011). A Novel Intuitionistic Fuzzy Approach for Tumor/Hemorrhage Detection in Medical Images, Journal of Scientific and Industrial Research, 70(6), 427-434. http://nopr.niscair.res.in/handle/123456789/11922

Chen, L., Chen, C. P., & Lu, M. (2011). A multiple kernel fuzzy c-means algorithm for image segmentation, IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics), 41(5), 1263-1274. https://doi.org/10.1109/TSMCB.2011.2124455

Chi, G., & Zhu, J. (2008). Spatial regression models for demographic analysis. Population Research and Policy Review, 27(1), 17–42. https://doi.org/10.1007/s11113-007-9051-8

Chuang, K., Tzeng, H.L., Chen, S., Wu, J., & Chen, T. J. (2006). Fuzzy c-means clustering with spatial information for image segmentation, Computerized medical imaging and graphics, 30 (1), 9-15. https://doi.org/10.1016/j.compmedimag.2005.10.001

Dubois, D., & Prade, H. (1990). Rough fuzzy sets and fuzzy rough sets, International Journal of General System, 17(2-3), 191-209. https://doi.org/10.1080/03081079008935107

Endo, Y., & Kinoshita, N. (2012). On objective based Rough C-means clustering, Proceedings of the IEEE conference on Granular Computing, 1-6. https://doi.org/10.1109/GrC.2012.6468682

Frank, A. (1991). Properties of geographic data: Requirements for spatial access methods, Proceedings of the Second International Symposium on Large Spatial Databases, Zürich. https://doi.org/10.1007/3-540-54414-3_40

Gibson, D., Kleinberg, J., & Raghavan, P. (2000). Clustering categorical data: An approach based on dynamical systems. The VLDB Journal, 8(3–4), 222–236. https://doi.org/10.1007/s007780050005

Gong, M., Liang, Y., Shi, J., Ma, W., & Ma, J. (2013). Fuzzy c-means clustering with local information and kernel metric for image segmentation, IEEE Transactions on image processing, 22(2), pp.573-584. https://doi.org/10.1109/TIP.2012.2219547

Guha, S., Rastogi, R., & Shim, K. (2000). ROCK: A robust clustering algorithm for categorical attributes. Information Systems, 25(5), 345–366. https://doi.org/10.1016/S0306-4379(00)00022-3

Hiremath, Shruthi, Pallavi Chandra, Anne Mary Joy & Tripathy, B. K. (2015). Neighbourhood rough set model for knowledge acquisition using MapReduce, Int. J. Communication Networks and Distributed Systems, 15(2/3), 212-234. https://doi.org/10.1504/IJCNDS.2015.070975

Hu, Q., Yu, D., Liu, J., & Wu, C. (2008). Neighborhood rough set based heterogeneous feature subset selection. Information Sciences, 178(18), 3577–3594. https://doi.org/10.1016/j.ins.2008.05.024

Huang, Z. (1998). Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery, 2(3), 283–304. https://doi.org/10.1023/A:1009769707641

Karthik, S., Priyadarshini, A., Anuradha, J., & Tripathy, B. (2011). Classification and rule extraction using rough set for diagnosis of liver disease and its types, Adv. Appl. Sci. Res, 2(3), 334-345.

Kumar, P., & Tripathy, B. K. (2009). MMeR: an algorithm for clustering heterogeneous data using rough set theory. International Journal of Rapid Manufacturing, 1(2), 189-207. https://doi.org/10.1504/IJRAPIDM.2009.029382

Kumar, S. U., & Inbarani, H. H. (2017). Neighborhood rough set based ECG signal classification for diagnosis of cardiac diseases. Soft Computing, 21(16), 4721-4733. https://doi.org/10.1007/s00500-016-2080-7

Lal, K. M., & Acharjya, D. P. (2010). Knowledge Granulation, Association Rules and Granular Computing, Proceedings of Second National Conference on Advanced Technologies in Electrical Engineering, Virudhunagar, Tamil Nadu, India, 43-46.

Parmar, D., Wu, T., & Blackhurst, J. (2007). MMR: An algorithm for clustering categorical data using Rough Set Theory. Data & Knowledge Engineering, 63(3), 879–893. https://doi.org/10.1016/j.datak.2007.05.005

Ranjan, S. N., Sinha, A. K., & Singh, J. B. (2012). The study of knowledge discovery with spatial Data Mining in Epidemiology Database, International Journal of Engineering Research and Technology (IJERT), 1(6), 1-11. ISSN: 2278-0181

Tiwari, S. P., & Srivastava A. K. (2013). Fuzzy rough set, fuzzy preorders and fuzzy topologies, Fuzzy sets and systems, 210, 63-68. https://doi.org/10.1016/j.fss.2012.06.001

Tripathy, B. K., & Ghosh, A. (2011a). SDR: An algorithm for clustering categorical data using rough set theory. In 2011 IEEE Recent Advances in Intelligent Computational Systems, 867-872. https://doi.org/10.1109/RAICS.2011.6069433

Tripathy, B. K., & Ghosh A. (2011b). SSDR: An Algorithm for Clustering Categorical Data Using Rough Set Theory, Advances in Applied science Research, 2(3), 314-326. https://doi.org/10.1109/RAICS.2011.6069433

Tripathy, B. K., Acharjya, D.P., & Cynthya, V. (2011). A Framework for Intelligent Medical Diagnosis Using Rough Set With Formal Concept Analysis, International Journal of Artificial Intelligence and Applications, 2(2), 45-66. https://doi.org/10.5121/ijaia.2011.2204

Tripathy, B. K., & Ghosh, A. (2013). Data clustering algorithms using rough sets. In Handbook of Research on Computational Intelligence for Engineering, Science, and Business, 297-327. IGI Global. https://doi.org/10.4018/978-1-4666-2518-1.ch012

Tripathy, B. K., & Mitra, A. (2014). On Algebraic and Topological Properties of Neighbourhood Based Multigranular Rough Sets. In 2014 International Conference on Computer Communication and Informatics, 1-6. IEEE. https://doi.org/10.1109/ICCCI.2014.6921770

Tripathy, B.K., & Anuradha, J. (2015). Soft Computing- Advances and Applications, Cengage Learning publishers, New Delhi. ISBN: 9788131526194

Tripathy, B. K., & Sharmila Banu (2016). Rough Fuzzy Set Theory and Neighbourhood Approximation Based Modelling for Spatial Epidemiology, Handbook of Research on Computational Intelligence Applications in Bioinformatics, (Eds: Sujata Das and Bidyadhar Subudhi), IGI publications, Chapter-6, pp.108-118. https://doi.org/10.4018/978-1-5225-0427-6.ch006

Tripathy, B. K., & Sharmila Banu, K. (2017). Exploring incidence-prevalence patterns in spatial epidemiology via neighborhood rough sets. International Journal of Healthcare Information Systems and Informatics (IJHISI), 12(1), 30-43. https://doi.org/10.4018/IJHISI.2017010103

Tripathy, B. K. (2017). Rough Set and Neighborhood Systems in Big Data Analysis. In: Computational Intelligence Applications in Business Intelligence and Big Data Analytics (Eds: Vijayan Sugumaran, Arun Kumar Sangaiah, Arunkumar Thangavelu), CRC press, Auerbach Publications. pp. 261-282. ISBN 9781498761017

Ulugtekin, N., Alkoy, S., & Seker, D. Z. (2007). Use of a geographic information system in an epidemiological study of measles in Istanbul. Journal of International Medical Research, 35(1), 150–154. https://doi.org/10.1177/147323000703500117

Vishwakarma, H.R., Tripathy, B. K., & D.P. Kothari (2014). Neighbourhood Based Knowledge Acquisition Using MapReduce from Big Data over Cloud Computing, Proceedings CSIBIG14, pp.183-188. https://doi.org/10.1109/CSIBIG.2014.7056958

Wang, J., McMichael, A. J., Meng, B., Becker, N. G., Han, W., Glass, K., & Zheng, X. (2006). Spatial dynamics of an epidemic of severe acute respiratory syndrome in an urban area. Bulletin of the World Health Organization, 84, 965-968. https://doi.org/10.2471/BLT.06.030247

Xu, R., & Wunsch, D. (2010). Clustering algorithms in biomedical research: A review, IEEE Rev. Biomed Eng, 3, 120-154. https://doi.org/10.1109/RBME.2010.2083647

Downloads

Published

2020-12-31

How to Cite

Tripathy, B., & Goel, A. (2020). A Neighbourhood Rough Set Based Clustering Algorithm and its Applications. Computer Reviews Journal, 8, 20-34. Retrieved from https://purkh.com/index.php/tocomp/article/view/902

Issue

Section

Research Articles

Most read articles by the same author(s)