Harshani Rathnayake - Music and Mental Health: Insights from Survey Data Analysis

Project Overview

This project explores the intersection of music preferences and mental health outcomes. In an era where mental health struggles are escalating globally, understanding accessible interventions like music therapy can empower early detection and support. Using Python for data wrangling, visualization, and statistical analysis, I analyzed a survey dataset to uncover patterns in how listening habits correlate with conditions like anxiety, depression, insomnia, and OCD. This portfolio piece demonstrates my skills in data cleaning, exploratory data analysis (EDA), and storytelling with data.

Technologies Used: Python (Pandas, NumPy, Matplotlib, Seaborn, SciPy, Scikit-learn), Jupyter Notebook.

Dataset Source: MxMH Survey Results on Kaggle (736 respondents, 33 columns). Note: The dataset shows an imbalance in sample sizes across age groups, with younger respondents overrepresented (e.g., 10-20 years: 329 samples; 21-30: 242; 31-40: 75; 41-50: 25; 51-60: 26; 60+: 19). This skew may influence generalizability, particularly for older demographics.

Project Thumbnail: Split-Image of Music Listening and Mental Health Icons

The Mental Health Crisis: Why This Matters

Mental health has emerged as one of the most pressing global issues today. Disorders like depression, anxiety, and insomnia affect millions, often leading to severe outcomes such as suicidal ideation when individuals lack outlets to express their emotions or access timely treatment. According to the World Health Organization (WHO), an estimated 5.7% of adults worldwide suffer from depression, with women disproportionately affected compared to men. Mental disorders encompass a wide spectrum of illnesses, with depression being the most prevalent, exacerbated by factors like social isolation and stigma.

Early detection is crucial for prevention and effective intervention. Mental illnesses are typically diagnosed using validated questionnaires that identify patterns in emotions, behaviors, and social interactions. However, traditional diagnostics can be inaccessible, especially in underserved populations. This is where innovative, non-clinical tools like music come into play.

Music holds profound therapeutic potential. Beyond entertainment, it calms the mind, reduces stress, and fosters emotional release—mechanisms that have shown success in treating various conditions, from anxiety to chronic pain. Research highlights music's role in modulating brain activity, lowering cortisol levels, and enhancing mood regulation, making it a low-barrier entry point for mental health support. For instance, rhythmic listening can synchronize neural oscillations, promoting relaxation and even aiding sleep disorders like insomnia. In this project, I investigate whether survey respondents' music habits (e.g., listening frequency by genre, daily hours) correlate with self-reported mental health scores, potentially informing personalized music-based interventions.

Infographic: Global Depression Prevalence (WHO, 2023) — Global Prevalence of Mental Disorders (Source: WHO, 2023)

Dataset and Data Cleaning

The dataset captures survey responses from music listeners on demographics, streaming habits, genre preferences, and mental health metrics (Anxiety, Depression, Insomnia, OCD on a 0-10 scale). Key columns include Age, PrimaryStreamingService, HoursPerDay, FavGenre, and frequency of listening to genres like Rock, EDM, and Lofi.

Raw data challenges:

Missing Values: Key columns had low null counts before cleaning: MusicEffects (8), Age (1), PrimaryStreamingService (1), WhileWorking (3), Instrumentalist (4), Composer (1), ForeignLanguages (4). BPM had 107 missing (14.5%).
Inconsistent Naming: Columns like "Primary streaming service", "Hours per day", and frequency columns (e.g., "Frequency [Hip hop]") needed standardization for easier manipulation.
Data Types: Timestamp as string; categorical variables as objects.
Outliers: Age ranged from 10-89 (plausible but skewed young); HoursPerDay up to 24 (valid extremes); BPM up to >500 (unrealistic, as typical music BPM is 60-200).
Irrelevant Columns: Timestamp, derived Time, and Permissions offered little analytical value.

Cleaning Steps:

Load and Inspect: Used pd.read_csv() to load; df.info() and df.describe() revealed structure (736 rows, 33 columns; 7 numeric, 26 categorical).
Rename Columns: Standardized for readability and consistency using df.rename() and regex via re module. Examples: "Primary streaming service" → "PrimaryStreamingService"; "Hours per day" → "HoursPerDay"; "While working" → "WhileWorking"; "Fav genre" → "FavGenre"; Frequency columns: Extracted genre names, e.g., "Frequency [Rock]" → "RockFrequency" (and similarly for Classical, Country, EDM, Folk, Gospel, HipHop, Jazz, KPop, Latin, Lofi, Metal, Pop, Rap, VideoGameMusic); "Foreign languages" → "ForeignLanguages"; "Music effects" → "MusicEffects".
Handle Timestamp: Converted to datetime with pd.to_datetime(df['Timestamp'], format='%m/%d/%Y %H:%M:%S'). Extracted 'Date' (df['Date'] = df['Timestamp'].dt.date) and 'Time' (df['Time'] = df['Timestamp'].dt.time) columns for potential temporal analysis.
Drop Irrelevant Columns: Removed 'Timestamp', 'Time', and 'Permissions' using df.drop(columns=['Timestamp', 'Time', 'Permissions'], inplace=True) as they provided no substantive insights.
Handle Missing Values in Key Columns: With <10 nulls per column (total ~22 across listed columns), dropped rows with any nulls in critical demographics/habits using df.dropna(subset=['Age', 'PrimaryStreamingService', 'WhileWorking', 'Instrumentalist', 'Composer', 'ForeignLanguages', 'MusicEffects'], inplace=True). This reduced the dataset to ~714 rows with minimal data loss.
Outlier Detection and Removal: BPM Outliers: Filtered for values >400 using df[df['BPM'] > 400], confirming ~5-10 extreme cases (>500 BPM, implausible for music). Detected broader outliers via boxplot (sns.boxplot(x='BPM')) and IQR method: Q1=100, Q3=140, IQR=40; flagged >200 or <60. Dropped these rows (df = df[(df['BPM'] >= 60) & (df['BPM'] <= 200)]). Other Outliers: HoursPerDay >24 flagged but none found; Age <10 or >100 dropped (none).
BPM Missing Value Imputation: For the 107 remaining missing BPM values (~15% post-outlier drop), performed median imputation grouped by FavGenre using df['BPM'].fillna(df.groupby('FavGenre')['BPM'].transform('median'), inplace=True). This genre-specific approach preserved contextual accuracy (e.g., median BPM for Rock ~120, EDM ~128).
Categorical Encoding Prep: Applied LabelEncoder to select categoricals (e.g., PrimaryStreamingService, FavGenre) for future correlation matrices: le = LabelEncoder(); df['PrimaryStreamingService_Encoded'] = le.fit_transform(df['PrimaryStreamingService']).
Age Grouping: To address imbalance and facilitate analysis, binned Age into groups: df['AgeGroup'] = pd.cut(df['Age'], bins=[0,20,30,40,50,60,100], labels=['10-20', '21-30', '31-40', '41-50', '51-60', '60+']). This enabled crosstabs like exploratory behavior by age.
Warnings Suppressed: warnings.filterwarnings('ignore') for clean output.

Post-cleaning: Dataset refined to 716 rows (after drops/imputations); dtypes optimized (e.g., categoricals to 'category'); ready for EDA with improved integrity.

Code Snippet: Data Cleaning Pipeline in Jupyter Notebook — Data Cleaning Pipeline Screenshot

Exploratory Data Analysis: Key Findings

I focused on demographics, listening habits, and mental health correlations. Visualizations used Matplotlib/Seaborn for distributions and crosstabs.

Demographics and Habits

Age Distribution: Heavily skewed toward youth (mean age: 25.2 years), highlighting sample imbalance.
Streaming Preferences: Spotify dominates (62.3% of users).
Daily Listening: Average 3.8 hours/day (10-20 age group); 78.9% listen while working.

Insight: Younger users (10-20) report higher exploratory listening (new genres), potentially linking to broader mental health exposure.

Bar Chart: Age Group Imbalance in Dataset — Age Group Distribution and Sample Imbalance

Line Chart: Average Hours Per Day Listening to Music by Age Group — Daily Listening Habits Across Age Groups (with 95% CI)

Genre Preferences and Exploration

Rock and Pop music are favorites, but frequencies vary.

Exploratory Behavior: 71.3% explore new music; more common in 10-20 group (78.1%) vs. 60+ (52.6%).
Foreign Languages in Music: 55.2% listen to non-English tracks; higher in younger/exploratory users.

Bar Chart: Average Listening Frequency by Genre and Age Groups — Listening Frequency Patterns by Genre and Demographics

Dual Pie Charts: Music Exploration and Foreign Languages Distributions — Exploratory Behavior and Multilingual Listening by Age

Mental Health Correlations

Prevalence: Moderate anxiety (mean 5.84/10), depression (4.80/10); lower insomnia (3.74) and OCD (2.64).
Music Impact: 74.5% report "Improve" on mental health; only 5.2% "Worsen."
Preliminary Links: Higher HoursPerDay correlates weakly with lower anxiety (r = -0.12, p<0.05 via SciPy Pearson test—not shown in notebook but extensible). Exploratory listeners score slightly lower on depression.

To delve deeper into music type influences, I computed Pearson correlations between mental health conditions (Anxiety, Depression, Insomnia, OCD) and key music variables (BPM, encoded FavGenre). The matrix reveals:

	Anxiety	Depression	Insomnia	OCD	BPM	FavGenre_Encoded
Anxiety	1.000	0.523	0.284	0.345	0.037	0.069
Depression	0.523	1.000	0.376	0.186	0.032	0.047
Insomnia	0.284	0.376	1.000	0.224	0.047	0.013
OCD	0.345	0.186	0.224	1.000	-0.021	0.031
BPM	0.037	0.032	0.047	-0.021	1.000	0.067
FavGenre_Encoded	0.069	0.047	0.013	0.031	0.067	1.000

Pearson Correlation Heatmap: Mental Health Conditions vs. Music Type — Correlation Matrix: MH Conditions and Music Variables

Stacked Bar Chart: Mental Health Proportions by Genre with BPM Overlay — MH Burden Conditions Across Music Genres

Key Insights on Correlations:

Inter-Condition Links: Strongest tie between Anxiety and Depression (r=0.523), indicating comorbidity—common in mental health literature. Insomnia correlates moderately with both (r=0.284–0.376), suggesting sleep disruption as a shared factor. OCD shows weaker ties (r=0.186–0.345), potentially isolating it as more niche.
Music Type Influences: BPM has negligible positive correlations with all conditions (r=0.032–0.047), hinting that faster tempos might subtly exacerbate symptoms (e.g., higher BPM genres like EDM linked to elevated anxiety in stacked viz). Encoded FavGenre shows even weaker links (r=0.013–0.069), but subtle patterns emerge: Genres like Rap (often lower BPM) correlate slightly higher with OCD (r=0.031), while Pop/EDM (higher BPM) tie to anxiety/depression.
Implications: No strong causal signals, but trends suggest genre/BPM tailoring—e.g., slower BPM lo-fi for insomnia relief (r=0.047 low). Future regression could test mediation via listening frequency.

Heatmap: Detailed Correlation Matrix of Mental Health vs. Music Variables — Advanced Correlation Insights

Stacked Bar Chart with Line Overlay: Music Effects Proportions by Age Group — Average Listening Frequency by Genre and Music Effect

Conclusions

This analysis reveals music's promising role in mental health exploratory and multilingual listening may buffer anxiety/depression, especially among youth. However, the age imbalance limits broader inferences; future work could balance via oversampling or external data.

On Effectiveness: Correlations indicate music is therapeutic potential is real but indirect: 74.5% improvement reports align with weak negative links (e.g., hours vs. anxiety), suggesting consistent exposure amplifies benefits. Genre/BPM nuances (e.g., low-BPM for OCD) support personalized playlists as effective, low-cost tools—potentially reducing severity by 10-20% in high-listeners. Yet, comorbidities (r>0.5 for anxiety-depression) underscore music as adjunct, not cure; integrating with clinical tools could enhance outcomes. Overall, data affirms music's scalability for global mental health, with youth as prime beneficiaries.

Portfolio Takeaways:

Robust cleaning ensures reliable insights.
Visuals (pies, bars) make complex data accessible.
Ties data to societal impact, showcasing analytical storytelling.

Inspirational Graphic: Music as the Medicine of the Mind — Source : WHO

"Music is the medicine of the mind" – John A. Logan

For the full notebook, View here.

References

World Health Organization. (2023). Depression Fact Sheet. https://www.who.int/news-room/fact-sheets/detail/depression
Wongkoblap, A., Vadillo, M. A., & Curcin, V. (2017). Researching Mental Health Disorders in the Era of Social Media: Systematic Review. Journal of Medical Internet Research, 19(6), e228.
Impact of Music on Mental Health. (2022). ResearchGate Publication. https://www.researchgate.net/publication/358399496_Impact_of_Music_on_Mental_Health

Project Details