This paper focuses on the main aspects of development of a qualitative system for dynamic content ﬁltering. These aspects include collection of meaningful training data and the feature selection techniques. The Web changes rapidly so the classiﬁer needs to be regularly re-trained. The problem of training data collection is treated as a special case of the focused crawling. A simple and easy-to-tune technique was proposed, implemented and tested. The proposed feature selection technique tends to minimize the feature set size without loss of accuracy and to consider interlinked nature of the Web. This is essential to make a content ﬁltering solution fast and non-burdensome for end users, especially when content ﬁltering is performed using a restricted hardware. Evaluation and comparison of various classiﬁers and techniques are provided.
PDF на сайте Института системного анализа ФИЦ ИУ РАН (на англ.): www.isa.ru/arxiv/2014/AIMSA2014_submission_20.pdf
Suvorov R., Sochenkov I., Tikhomirov I. Training Datasets Collection and Evaluation of Feature Selection Methods for Web Content Filtering // Artificial Intelligence: Methodology, Systems, and Applications. – Springer International Publishing, 2014. – С. 129-138