Training Datasets Collection and Evaluation of Feature Selection Methods for Web Content Filtering

Авторы

Соченков И. В.

Аннотация

This paper focuses on the main aspects of development of a qualitative system for dynamic content filtering. These aspects include collection of meaningful training data and the feature selection techniques. The Web changes rapidly so the classifier needs to be regularly re-trained. The problem of training data collection is treated as a special case of the focused crawling. A simple and easy-to-tune technique was proposed, implemented and tested. The proposed feature selection technique tends to minimize the feature set size without loss of accuracy and to consider interlinked nature of the Web. This is essential to make a content filtering solution fast and non-burdensome for end users, especially when content filtering is performed using a restricted hardware. Evaluation and comparison of various classifiers and techniques are provided.

Внешние ссылки

DOI: http://dx.doi.org/10.1007/978-3-319-10554-3_12

PDF на сайте Института системного анализа ФИЦ ИУ РАН (на англ.): www.isa.ru/arxiv/2014/AIMSA2014_submission_20.pdf

Ссылка при цитировании

Suvorov R., Sochenkov I., Tikhomirov I. Training Datasets Collection and Evaluation of Feature Selection Methods for Web Content Filtering // Artificial Intelligence: Methodology, Systems, and Applications. – Springer International Publishing, 2014. – С. 129-138