A/B Testing for Muzz

Muzz – Product Data Analyst Exercise (2025)

Type: Product Experimentation (A/B/C Test)
Tools: Python (Pandas, NumPy, SciPy), Jupyter Notebook, Matplotlib, Seaborn

Overview

This project involved analyzing data from a real-world A/B/C experiment designed to improve Muzz’s phone verification process for new users.
The goal was to evaluate which screen design achieved the highest verification success rate at the lowest average cost per user, while maintaining fairness across user demographics.

The Challenge

Muzz tested three onboarding screens for phone verification:

Variant	Screen Description	Verification Methods
A	SMS Only	SMS
B	SMS Primary	SMS + WhatsApp
C	WhatsApp Primary	WhatsApp + SMS

Each user saw one variant at random after app installation. My task was to interpret the test, evaluate its effectiveness, and recommend which design should be adopted.

Data Used

Three datasets were provided:

verification.csv – user ID, test group, selected method, and verification outcome.
profiles.csv – user demographics (gender, age, country).
costs.csv – per-country unit costs for SMS and WhatsApp verification.

I joined and cleaned these datasets, validated randomization across variants, and engineered new variables such as age group for better insight segmentation.

Approach

Exploratory Analysis: Validated data integrity, balanced sample sizes, and checked for bias across gender and age.
Feature Engineering: Derived age and age_group from date of birth to understand demographic distribution.
Verification Metrics: Measured the verification rate per group and per method (SMS vs WhatsApp).
Statistical Testing: Conducted two-proportion z-tests to confirm whether observed differences between groups were statistically significant.
Cost Analysis: Combined verification shares with fixed per-country costs to estimate the average cost per verified user.

Key Findings

Verification Rate:

A = 87.3%, B = 92.8%, C = 92.8%.
B vs A: +5.5 pp (p < 0.001) → statistically significant improvement.
C vs A: +5.5 pp (p < 0.001).
B vs C: +0.02 pp (p = 0.97) → no meaningful difference.

Cost per User:

B: $0.0934
C: $0.0571
→ Screen C saves approximately $0.036 per user.

Impact:
At 1 million new users, Screen C would save around $26,652 compared to Screen B.

Recommendation

✅ Recommended Screen: C – WhatsApp Primary
💡 Why:

Achieves the same high verification rate as Screen B.
Lower average cost due to higher WhatsApp adoption.
Intuitive design that aligns with modern user behavior (many prefer WhatsApp).

Additional Insights

Country-level SMS pricing variation slightly biased average costs. Adjusted analysis with fixed global unit prices to ensure fairness.
User behavior suggests that design C encourages more users to choose WhatsApp (63.8%) over SMS, increasing efficiency.
Bounce rate and second-choice behavior could further refine UX design if additional clickstream data were available.

Takeaway

This project demonstrates:

Analytical storytelling: turning test data into a clear business decision.
Statistical rigor: using hypothesis testing to validate product impact.
Business mindset: translating technical findings into cost savings and product recommendations.

Tags: Data Analysis Python