Ru-RSTreebank. Russian Discourse Corpus.
Ru-RSTreebank is a corpus of texts in Russian annotated in the framework of the Rhetorical Structure Theory that was developed in the 1980s by W. Mann and S. Thompson.
Learn more:
The corpus is intended for researchers interested in studying written discourse. The corpus allows you to conduct various experiments on the automatic text analysis the using data on discourse relations within it.
Possible applications include text generation, fact extraction, automatic summarization, anaphora resolution etc.
Corpus volume: 333 texts, about 328 000 tokens.
Genres: news, popular science, scientific articles and blogs
When quoting or mentioning project materials, please cite one of the following the following: