Home » Projects » 8410 Computational Text Analysis
Goal: Conduct computational lingustics analysis of particular subreddits of choice, with emphasis on sentiment analysis.
This project explores various aspects of data analysis and visualization within the context of subreddit text analysis. The project aims to extract valuable insights from these datasets using data science methodologies.
This project focuses on text analysis of subreddit content from four subreddits: lawschool, data engineering, solotravel, and covid19positive. We extract topics, sentiments, and conduct sentiment analysis for further insights.
Subreddit Content: Text data is extracted from the selected subreddits. These datasets include post content, sentiment scores processed from VADER algorithim, and other relevant attributes such as date, and title.
Subreddit Text Analysis: Text data from subreddits is preprocessed by removing stopwords and punctuation. We use Natural Language Processing (NLP) techniques for topic modeling, sentiment analysis, and named entity recognition (NER). Visualization techniques include scatter plots, bar charts, and box plots.
In conclusion, this project provides a comprehensive exploration of data science methodologies applied to subreddit content. Valuable insights are obtained from the sentiment and topic analysis of subreddit content provides interesting findings. The project opens opportunities for future analysis and recommendations.
Future iteration could take the form of finding relationships or patterns between subreddits rather than isolated subreddit analyses.
MIT License