Feature selection embedded cluster distribution position for characteristic analysis of multi-dimension poverty-stricken households in China

Hui Liu; Yang Liu; Ran Zhang; Dezheng Liu; Zheng Zhang

doi:10.6180/jase.202106_24(3).0003

Feature selection embedded cluster distribution position for characteristic analysis of multi-dimension poverty-stricken households in China

Computer Science and Information Engineering

Hui Liu^1,2 , Yang Liu This email address is being protected from spambots. You need JavaScript enabled to view it.³ , Ran Zhang¹ , Dezheng Liu¹ , and Zheng Zhang⁴

¹School of Software, Dalian University of Technology, Dalian, Liaoning, China
²Faculty of Business and Management, Universiti Teknologi MARA Sarawak Branch, Sarawak, Malaysia
³International School, Shenyang Jianzhu University, Shenyang, Liaoning, China
⁴International School of Information Science Engineering, Dalian University of Technology, Dalian, Liaoning, China

Received: August 12, 2020
Accepted: November 26, 2020
Publication Date: June 1, 2021

Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.

Download Citation: ||https://doi.org/10.6180/jase.202106_24(3).0003

ABSTRACT

Poverty is a historical problem all over the world. Poverty alleviation targeted to the primary problems faced by the poverty-stricken households in different classes can effectively improve the efficiency of poverty reduction. Owing to the large number of features that reflect the poor situation of poverty-stricken households, it is difficult for traditional methods to accurately analyse the primary problems faced by the poverty-stricken households. In view of the high performance of feature selection methods in dealing with high-dimensional data, a feature selection approach by taking the distribution position of features in each class into consideration is proposed in this paper. We use the Gaussian mixture model to describe the distribution of features in the same dimension, and measure the distribution position of the cluster consists of features in each class according to their Gauss mixture ingredients. Features with significant differences in distribution position between classes are selected, which can effectively represent the characteristics of samples in different classes. According to the experimental results, the proposed method performs well in determining the characteristics of samples in different classes, and can accurately analyze the typical features of poverty-stricken households in different classes, which provides the basis for the design of targeted poverty alleviation strategies.

Keywords: Poverty alleviation; poverty-stricken households; feature selection; Gaussian mixture model

REFERENCES

[1] Abid Haleem, Mohd Javaid, and Ibrahim Haleem Khan. Current status and applications of Artificial Intelligence (AI) in medical field: An overview. Current Medicine Research and Practice, 9(6):231–237, 2019.
[2] Renato Cordeiro de Amorim. Unsupervised feature selection for large data sets. Pattern Recognition Letters, 128:183–189, 2019.
[3] Hongjun Wang, Yinghui Zhang, Ji Zhang, Tianrui Li, and Lingxi Peng. A factor graph model for unsupervised feature selection. Information Sciences, 480:144– 159, 2019.
[4] Gang Kou, Pei Yang, Yi Peng, Feng Xiao, Yang Chen, and Fawaz E. Alsaadi. Evaluation of feature selection methods for text classification with small datasets using multiple criteria decision-making methods. Applied Soft Computing Journal, 86, 2020.
[5] Ting Xie, Pengfei Ren, Taiping Zhang, and Yuan Yan Tang. Distribution preserving learning for unsupervised feature selection. Neurocomputing, 289:231–240, 2018.
[6] Jianhua Jia, Ning Yang, Chao Zhang, Anzhi Yue, Jianyu Yang, and Dehai Zhu. Object-oriented feature selection of high spatial resolution images using an improved Relief algorithm. Mathematical and Computer Modelling, 58(3-4):619–626, 2013.
[7] RJ Urbanowicz, M Meeker, W La Cava Journal of biomedical . . . , and Undefined 2018. Relief-based feature selection: Introduction and review. Journal of Biomedical Informatics, 85:189–203, 2018.
[8] Tarek Smayra, Zahra Charara, Ghassan Sleilaty, Gaelle Boustany, Lina Menassa-Moussa, and Georges Halaby. Classification and Regression Tree (CART) model of sonographic signs in predicting thyroid nodules malignancy. European Journal of Radiology Open, 6:343–349, 2019.
[9] Jixiong Zhang, Yanmei Xiong, and Shungeng Min. A new hybrid filter/wrapper algorithm for feature selection in classification. Analytica Chimica Acta, 1080:43–54, 2019.
[10] Antonio J. Tallón-Ballesteros, José C. Riquelme, and Roberto Ruiz. Semi-wrapper feature subset selector for feed-forward neural networks: Applications to binary and multi-class classification problems. Neurocomputing, 353:28–44, 2019.
[11] Shiping Wang, Witold Pedrycz, Qingxin Zhu, and William Zhu. Subspace learning for unsupervised feature selection via matrix factorization. Pattern Recognition, 48(1):10–19, 2015.
[12] Deng Cai, Chiyuan Zhang, and Xiaofei He. Unsupervised feature selection for Multi-Cluster data. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 333–342, 2010.
[13] Jiuqi Han, Zhengya Sun, and Hongwei Hao. Selecting feature subset with sparsity and low redundancy for unsupervised learning. Knowledge-Based Systems, 86:210–223, 2015.
[14] Yu Xie and Jingwei Hu. An Introduction to the China Family Panel Studies (CFPS). Chinese Sociological Review, 47(1):3–29, 2014.
[15] Anna Beer, Daniyal Kazempour, Lisa Stephan, and Thomas Seidl. LUCK- Linear Correlation Clustering Using Cluster Algorithms and a kNN based Distance Function. In ACM International Conference Proceeding Series, pages 181–184. Association for Computing Machinery, jul 2019.
[16] Arnab Karmakar, Deepak Mishra, and Anandmayee Tej. Stellar cluster detection using GMM with deep variational autoencoder. In 2018 IEEE Recent Advances in Intelligent Computational Systems, RAICS 2018, pages 122–126, 2019.