Home » Projects » 8410 Computational Text Analysis

Subreddit Sentiment Analysis, A Course Final Project

Goal: Conduct computational lingustics analysis of particular subreddits of choice, with emphasis on sentiment analysis.

Introduction

This project explores various aspects of data analysis and visualization within the context of subreddit text analysis. The project aims to extract valuable insights from these datasets using data science methodologies.

Project Description

This project focuses on text analysis of subreddit content from four subreddits: lawschool, data engineering, solotravel, and covid19positive. We extract topics, sentiments, and conduct sentiment analysis for further insights.

Data Description

Subreddit Content: Text data is extracted from the selected subreddits. These datasets include post content, sentiment scores processed from VADER algorithim, and other relevant attributes such as date, and title.

Methodology

Subreddit Text Analysis: Text data from subreddits is preprocessed by removing stopwords and punctuation. We use Natural Language Processing (NLP) techniques for topic modeling, sentiment analysis, and named entity recognition (NER). Visualization techniques include scatter plots, bar charts, and box plots.

Conclusion

In conclusion, this project provides a comprehensive exploration of data science methodologies applied to subreddit content. Valuable insights are obtained from the sentiment and topic analysis of subreddit content provides interesting findings. The project opens opportunities for future analysis and recommendations.

Future iteration could take the form of finding relationships or patterns between subreddits rather than isolated subreddit analyses.

License

MIT License