Hauptseminar Sentimentanalyse

Sommersemester 2012
Wiltrud Kessler
Tuesday 14:00 - 15:30
Pfaffenwaldring 5b, V 5.01
2 SWS / 3 ECTS (6 ECTS)
[LSF]

Course goals

Students will get an overview about the area of sentiment analysis and the challenges it presents. Students will read scientific papers and familiarize themselves with this kind of literature.
Note: The papers that will be read this year are different from those read last year, students from last year are welcome to join.
Please also note: That obviously you cannot get credit twice for the same class even though you do a different presentation.

Schedule and Resources

Submission and e-mail notification will be managed in ILIAS, so please register there.
Day Topic Presenter Material*
Tuesday 10.4. Introduction to sentiment analysis Wiltrud Kessler Slides
[Liu10]
Tuesday 17.4. Seminar topics distribution
How to find literature
Wiltrud Kessler Slides
Tuesday 24.4. Sentiment polarity and polarity modifiers Wiltrud Kessler Slides
Tuesday 1.5. No class (public holiday)
Tuesday 8.5. Evaluation of supervised text classification Wiltrud Kessler Slides
Java Code
WEKA
Tuesday 15.5. Automatically determining word polarity
Turney: "Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews"
Wiltrud Kessler [Tu02]
Slides
Tuesday 22.5. Linguistic Features
Matsumoto, Takamura, and Okumura: "Sentiment classification using word sub-sequences and dependency sub-trees"
Bernadette [MTO05]
Tuesday 29.5. No class (Pfingstferien)
Tuesday 5.6. Subjectivy Classification
Wilson, Wiebe and Hoffmann: "Recognizing contextual polarity in phrase-level sentiment analysis"
(Concentrate on 'neutral-polar classification')
Wiltrud Kessler [WWH05]
Slides
MPQA Sentiment Resources
Tuesday 12.6. Subjectivity Word Sense Disambiguation
Akkaya, Wiebe and Milhacea: "Subjectivity word sense disambiguation"
(Concentrate on section 3)
Maxim [AWM09]
Tuesday 19.6. Comparative Sentences
Jindal and Liu: "Identifying comparative sentences in text documents"
Cornelia [JL06a]
Tuesday 26.6. No class
Tuesday 3.7. Opinion Spam
Ott, Choi, Cardie, and Hancock: "Finding deceptive opinion spam by any stretch of the imagination"
Jonathan [OCCH11]
Tuesday 10.7. Conditional Sentences
Narayanan, Liu, and Choudhary: "Sentiment analysis of conditional sentences"
Vitalia [NLC09]
Tuesday 17.7. Polarity Reversers
Ikeda, Takamura, Ratinov, and Okumura: "Learning to shift the polarity of words for sentiment classification"
(Concentrate on 'word-wise learning')
Melanie [ITRO08]
* If you are unable to access the paper, a subscription from the university library may be needed. Try it from inside the university network.

Course Content

Sentiment analysis automatically identifies opinions expressed in language about real-world items. Most commonly, opinions are classified into the categories "positive" and "negative". Sentiment analysis has become an important topic over the last 10 years and there has been a large amount of publications in this area. In this seminar different methods for analyzing opinions on different levels will be presented.

Subjectivity Classification

Subjective statements refer to the internal state of mind of a person and cannot be observed. In contrast, objective statements can be verified by observing and checking reality. It is sometimes useful for a sentiment analysis system to filter out objective language and predict sentiment based on subjective language only. Unfortunately, detecting subjectivity is also a complicated problem.

References: [RW03], [WWH05]
Subjectivity Word Sense Disambiguation

Sentiment analysis often uses dictionaries that list the polarity of each word. However, many words have both subjective and objective senses. Subjective words used in an objective sense are a significant source of error in sentiment classification. Subjectivity word sense disambiguation tries to automatically determine which word instances in a corpus are being used with objective senses.

References: [WM06], [AWM09], [AWCM11]
Polarity Reversers

To determine the polarity of an expression with only a lexicon of positive and negative words is often not sufficient, because many phenomena can influence the polarity. The most obvious example for such influences are "polarity reversers", words that reverse the polarity of a sentiment word, e.g., "no" or "not". An approach to tackle this problem is to assume the polarity of a word is known and classify each sentiment word as reversed or non-reversed according to its context.

References: [ITRO08], [CC08], [WBRK10]
Conditional Sentences

Conditional sentences are sentences that describe implications or hypothtical situations and their consequences. Some conditional sentences directly express sentiment on a product, but many of them express a hypothetical situation, a wish or a general implication.

References: [NLC09]
Comparative Sentences

A common way to express opinions is by comparing one entity with a different entity. There are different types of comparisons, direct comparisons of two entities, a comparison of the entity to a general standard and superlatives that set one entity above all others in the comparison set. Simply detecting comparative adverbs or adjectives is not sufficient, because it is possible for a sentence to contains a comparative word, although it is not a comparative sentence ("couldn't agree with you more") while on the other hand a comparative sentence does not necessarily have to include any comparative word ("no joy stick unlike the sony ericsson t60").

References: [JL06a], [JL06b], [GL08]
Topic Models (CS)

These papers present a framework for extracting the ratable aspects of objects from online user reviews. A statistical model is used to discover topics in text and extract text snippets supporting the ratings of aspect different aspects.

References: [TM08a], [TM08b]
Linguistic Features (CS)

Many classifiers for the classification of sentiment polarity use only shallow features like bag-of-words. To enhance the accuracy of sentiment polarity classification, several features based on linguistic analysis and syntactic structures have been proposed.

References: [DLP03], [Ga04], [MTO05]
Opinion Spam

The term "opinion spam" refers to fictive reviews that have been written to mislead humans or automatic systems in their evaluation of the opinions about a product or a service. Fictive positive reviews are written to artificially improve the perceived opinion of a product or a service, fictive negative reviews are written to damage the reputation of a competitor or its products.

References: [JL07], [JL08], [OCCH11]

General Organizational Information

The course is open for students of

This course includes a number of introductory classes about the basics of sentiment analysis and the most important challenges in the area. Afterwards, some specific challenges for automatic sentiment analysis are presented in talks by the students. To get credit for this class, you need to give a presentation and hand in a written report about one of the topics presented above. Every student is required to read all papers to be discussed in class beforehand.

Some previous knowledge of machine learning methods may be helpful (e.g., from the class "statistische Sprachverarbeitung" or "Information Retrieval").

Evaluation

To get credit for this class, you need to give a presentation and hand in a written report about one of the topics presented above. The grade consist of the following parts:

Submissions will be managed in ILIAS.

Template for LaTeX: LaTeX main file, example bib file, bibtex style file, EACL style file, lingmacros style file (you may not need this if it is already installed on your computer).
The compiled LaTeX file with a lot of useful hints (please have a look at this even if you are using Word): pdf
Template for Word (I don't have Word, so these are the original EACL 2012 files, please ignore the instructions there and have a look at the compiled LaTeX file linked above for hints): Word document, style file.

A very quick guide to writing your report in LaTeX:
Download the files linked above. Put all of them in one folder. Rename the .tex file to ausarbeitungYOURNAME.tex. Open a terminal, go to that folder and type pdflatex ausarbeitungYOURNAME.tex. After a lot of printing on the command line, you should get a file named ausarbeitungYOURNAME.pdf. Voila, you did it!
Read through the things in ausarbeitungTemplate.tex, it contains examples for writing in italics, bold, creating tables, figures and references. Just copy what you need. Also, there are many many resources online, e.g. the LaTeX Wikibook.
If you get an error like ! LaTeX Error: File 'XYZ.sty' not found. make sure the file is in the same folder. If it is a file with .sty, it is a package. You have two possibilities, (a) remove the line \usepackage{XYZ} (which might cause some commands not to work or some things to look differently), or (b) try to download that file from CTAN and put it into your folder (it might be more complicated).
If you get a warning LaTeX Warning: There were undefined references. you will notice some ?? in your document at places where references should be. For references to sections, tables of figures, just run pdflatex ausarbeitungYOURNAME.tex again. For bibliography references you need to run bibtex ausarbeitungYOURNAME and then run pdflatex ausarbeitungYOURNAME.tex again twice.
If you get a warning LaTeX Warning: Label(s) may have changed. Rerun to get cross-references right. some references may be wrong (e.g. section 3 has changed to be now section 4, but your reference still says "see section 3"). Rerun pdflatex ausarbeitungYOURNAME.tex to get them right.
Very important: Before you hand in, make sure none of these warnings appear!

References

General literature on sentiment analysis:
[PL08] Bo Pang and Lillian Lee. 2008. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2(1-2), pages 1-135.
[Liu10] Bing Liu. 2010. Sentiment Analysis and Subjectivity. In: Handbook of Natural Language Processing, Second Edition.

Specific references for the talks:
[AWCM11] Cem Akkaya, Janyce Wiebe, Alexander Conrad and Rada Milhacea. 2010. Improving the impact of subjectivity word sense disambiguation on contextual opinion analysis. In Proceedings of CoNLL '11, pages 87-96.
[AWM09] Cem Akkaya, Janyce Wiebe and Rada Milhacea. 2009. Subjectivity word sense disambiguation. In Proceedings of EMNLP '09, pages 190-199.
[CC08] Yejin Choi and Claire Cardie. 2008. Learning with compositional semantics as structural inference for subsentential sentiment analysis. In Proceedings of EMNLP '08, pages 793-801.
[DLP03] Kushal Dave, Steve Lawrence, and David M Pennock. 2003. Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In Proceedings of WWW '03, pages 519-528.
[Ga04] Michael Gamon. 2004. Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis. In Proceedings of COLING '04.
[GL08] Murthy Ganapathibhotla and Bing Liu. 2008. Mining opinions in comparative sentences. In Proceedings of COLING '08, pages 241-248.
[ITRO08] Daisuke Ikeda, Hiroya Takamura, Lev-Arie Ratinov, and Manabu Okumura. 2008. Learning to shift the polarity of words for sentiment classification. In Proceedings of IJCNLP '08, pages 50-57.
[JL06a] Nitin Jindal and Bing Liu. 2006. Identifying comparative sentences in text documents. In Proceedings of SIGIR '06, pages 244-251.
[JL06b] Nitin Jindal and Bing Liu. 2006. Mining comparative sentences and relations. In Proceedings of AAAI '06, pages 1331-1336.
[JL07] Nitin Jindal and Bing Liu. 2007. Analyzing and detecting review spam. In Proceedings of ICDM '07, pages 547-552.
[JL08] Nitin Jindal and Bing Liu. 2008. Opinion spam and analysis. In Proceedings of WSDM '08, pages 219-230.
[MTO05] Shotaro Matsumoto, Hiroya Takamura, and Manabu Okumura. 2005. Sentiment classification using word sub-sequences and dependency sub-trees.In Proceedings of PAKDD '05, pages 301-311.
[NLC09] Ramanathan Narayanan, Bing Liu, and Alok Choudhary. 2009. Sentiment analysis of conditional sentences. In Proceedings of EMNLP '09, pages 180-189.
[OCCH11] Myle Ott, Yejin Choi, Claire Cardie, and Jeffrey T. Hancock. 2011. Finding deceptive opinion spam by any stretch of the imagination. In Proceedings of HLT '11, pages 309-319.
[RW03] Ellen Riloff and Janyce Wiebe. 2003. Learning extraction patterns for subjective expressions. In Proceedings of EMNLP '03, pages 105-112.
[TM08a] Ivan Titov and Ryan McDonald. 2008. A joint model of text and aspect ratings for sentiment summarization. In Proceedings of ACL '08, pages 308-316.
[TM08b] Ivan Titov and Ryan McDonald. 2008. Modeling online reviews with multi-grain topic models. In Proceedings of WWW '08, pages 111-120.
[Tu02] Peter D. Turney. 2002. Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In Proceedings of ACL '02, pages 417-424.
[WBRK10] Michael Wiegand, Alexandra Balahur, Benjamin Roth, and Dietrich Klakow. 2010. A survey on the role of negation in sentiment analysis. In Proceedings of NeSp-NLP '10, pages 60-68.
[WM06] Janyce Wiebe and Rada Milhacea. 2006. Word sense and subjectivity. In Proceedings of ACL '06, pages 1065-1072.
[WWH05] Theresa Wilson, Janyce Wiebe and Paul Hoffmann. 2005. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of HLT '05, pages 347-354.