We created a set of 58 503 sentences annotated with polarity by taking advantage of semistructured reviews from epinions.com collected by Branavan et al., 2009 (camera and cellphone data sets, 17 442 reviews in total).
The annotation works as follows: In semistructured reviews users provide pros (product aspects the user evaluates as positive) and cons (product aspects the user evaluates as negative) in addition to the written text of the review. All pros (resp. cons) longer than 3 tokens are extracted as a sentence with label positive (resp. negative). Shorter pros (resp. cons) are stripped of sentiment words (using the subjectivity clues dictionary) and if the resulting string is found in the review text, the containing sentence is extracted as positive (resp. negative).
This is a somewhat simplistic method and we plan to improve on this in the future. To judge the quality of the automatic annotation, we hired a graduate student of computational linguistics to manually annotate a random subset of 1271 sentences. The agreement of the automatic and manual annotation is 0.79; Cohen's κ is 0.61.
If you are interested in this dataset, please contact me.
As attribution, please cite this paper:
Wiltrud Kessler and Hinrich Schütze (2012)
Classification of Inconsistent Sentiment Words Using Syntactic Constructions.
In Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012),
Mumbai, India, 10.-14. December 2012. pages 569-578.
Persistent identifier (PID) for citation: