This research is based on the CLEF/eRisk 2017 pilot task which is focused on early risk detection of depression. The CLEF/eRsik 2017 dataset consists of text examples collected from messages of 887 Reddit users. The main idea of the task is to classify users into two groups: risk case of depression and non-risk case. This paper considers different feature sets for depression detection task among Reddit users by text messages processing. We examine our bag-of-words, embedding and bigram models using the CLEF/eRisk 2017 dataset and evaluate the applicability of stylometric and morphological features. We also perform a comparison of our results with the CLEF/eRisk 2017 task report.
DOI: http://dx.doi.org/10.5220/0006598604260431
PDF at Semantic Scholar: https://api.semanticscholar.org/CorpusID:4776046
PDF at Google Scholar: https://scholar.google.ru/scholar?oi=bibs&cluster=4547818838430300186&btnI
Stankevich M. A., Isakov V. A., Devyatkin D. A., Smirnov I. V. Feature Engineering for Depression Detection in Social Media // In the collection: Informatics, Management and System Analysis of Works V of the All-Russian Scientific Conference of Young Scientists with international participation. 2018. Page 237-246.