Dataset. The Enron corporation was brought down by a massive insider trading scandal. As part of the court case, the email archives of the involved employees were made public. We have obtained a ~50 GB dataset containing a large number of messages.
E1: ENRON sentiment. Use sentiment detection to assess email message moods and analyze whether they spread and correlate with days and stock price. Visualize the flow of bad/good moods structurally and over time.
Summary. Spark was used to extract the written message text from all emails (removing quoted text) and only keeping emails sent between ENRON employees (thus removing spam). Sentiment is derived from the text using CoreNLP. Finally, the resulting data is analyzed with SparkSQL. The per-person graphs below summarize the sentiment in the inbox of a particular user on a particular day.
Data curiosity: ** Related work: *** Technical difficulties mastered: ** Visualization coolness: **