Personal Projects

Wine Quality Analysis

I'm analyzing chemical properties of wine to predict quality ratings, aiming to help myself open a hypothetical winery that serves premium wines. By examining factors like acidity, pH, alcohol content, and sulfur dioxide levels, I've implemented both Principal Component Analysis and Linear Regression to build reliable predictive models. My analysis reveals that PCA significantly improves prediction accuracy with an MSE reduction of 0.18763, showing me that the interrelationships between chemical components matter more than individual properties when determining wine quality.

View Full Wine Quality Project

Rosetta Stone Analysis

In this project, I aggregate app activity and subscriber data as part of the data cleaning process and identify important columns in the subscriber data. I generate metrics for customer engagement and value to answer key questions, then cluster the data based on these metrics. By combining engagement factors, such as app and email activity, with monetary value, I aim to understand customer value to the company. Through models like linear regression, logistic regression, and clustering, I explore how engagement varies across platforms, the correlation between subscription length and email frequency, and whether users with auto-renewal have longer or shorter subscriptions.

View Full Rosetta Stone Project

Twitter US Airline Sentiment Analysis

In this project, I applied natural language processing (NLP) techniques to perform sentiment analysis on tweets directed at U.S. airlines. My objective was to classify these tweets as positive, negative, or neutral. By analyzing the Twitter US Airline Sentiment dataset, I identified class imbalances, with a significant number of negative sentiments related to customer service issues. Using machine learning algorithms, I found that while the KNN model struggled to accurately classify sentiments, logistic regression performed best due to its ability to predict sentiment based on word probabilities in the tweets.

View Full Twitter US Airline Sentiment Analysis

Print Preview of the Code

Analysis and Launch Plan for Retail Startup

In this project, I analyzed multiple datasets using Python's pandas library to identify the best cities for store expansion. I examined key factors like population, household income, store performance, and consumer spending trends to pinpoint high-revenue potential areas. By merging datasets and visualizing sales data, I uncovered top-earning stores and cities with strong market demand. To ensure accuracy, I cleaned and standardized the data, addressing formatting issues and merging datasets to create a comprehensive analysis. I also identified key consumer spending trends, revealing the top-selling product categories in each market. Based on my findings, the best cities for expansion are Los Angeles, Chicago, Dallas, Atlanta, Houston, Cleveland, Philadelphia, Denver, Austin, and Tampa. These cities consistently ranked high across multiple metrics, making them ideal for growth and profitability. These insights provide a data-driven approach to strategic store expansion, maximizing potential revenue and market success.

Analysis and Launch Plan for Retail Startup by Tess Kramer

Artist Classification Using CNN

In this project, I built a convolutional neural network (CNN) to classify paintings by artist based on their distinct artistic styles. The dataset included images of paintings from the top 50 most influential artists, along with metadata about the artists. I preprocessed the data by organizing image files, loading them into a structured dataset, and visualizing key attributes like artist nationality and genre. The CNN model consisted of convolutional layers for feature extraction, max pooling layers for dimensionality reduction, and dense layers for classification. Despite implementing data augmentation to enhance generalization, the model achieved low accuracy and precision, indicating challenges in distinguishing artistic styles due to variations within an artist’s work. Given the model’s underperformance, improvements such as deeper architectures, additional training data, or alternative classification methods could enhance future results.

Artist Classification Using CNN - Project Overview by Tess Kramer