Training Datasets Collection and Evaluation of Feature Selection Methods for Web Content Filtering

Authors

Sochenkov I.

Annotation

This paper focuses on the main aspects of development of a qualitative system for dynamic content filtering. These aspects include collection of meaningful training data and the feature selection techniques. The Web changes rapidly so the classifier needs to be regularly re-trained. The problem of training data collection is treated as a special case of the focused crawling. A simple and easy-to-tune technique was proposed, implemented and tested. The proposed feature selection technique tends to minimize the feature set size without loss of accuracy and to consider interlinked nature of the Web. This is essential to make a content filtering solution fast and non-burdensome for end users, especially when content filtering is performed using a restricted hardware. Evaluation and comparison of various classifiers and techniques are provided.

External links

DOI: dx.doi.org/10.1007/978-3-319-10554-3_12

PDF at the Institute for Systems Analysis of Russian Academy of Sciences website: www.isa.ru/arxiv/2014/AIMSA2014_submission_20.pdf

Download PDF or read online at ResearchGate: https://www.researchgate.net/publication/300024762_Training_Datasets_Collection_and_Evaluation_of_Feature_Selection_Methods_for_Web_Content_Filtering

Semantic Scholar: https://api.semanticscholar.org/CorpusID:39150523

Reference link

Suvorov R., Sochenkov I., Tikhomirov I. Training Datasets Collection and Evaluation of Feature Selection Methods for Web Content Filtering // Artificial Intelligence: Methodology, Systems, and Applications. – Springer International Publishing, 2014. – С. 129-138