Credit Card Fraud Detection

Problem

Credit card fraud is a major challenge for banks and financial institutions. Detecting fraudulent transactions in real-time is difficult because the dataset is highly imbalanced—fraud cases are extremely rare compared to legitimate transactions.

Approach

I developed a machine learning solution to classify transactions as fraudulent or legitimate. The workflow included extensive data preprocessing, feature engineering, and handling class imbalance. I compared multiple models, including Random Forest and XGBoost, and optimized them using cross-validation and early stopping. I also experimented with different decision thresholds to maximize the F1-score and balance between precision and recall.

Results

The final model successfully reduced false positives while maintaining high recall, ensuring more fraudulent cases were detected without overwhelming the system with false alerts. The performance was evaluated using confusion matrix, precision, recall, and F1-score, demonstrating the model’s ability to handle imbalanced data effectively.

Random Forest: Precision 94.8%, Recall 74.5%, F1 ≈ 0.835
XGBoost: Precision 96.7%, Recall 79.7%, F1 ≈ 0.873

While Random Forest caught slightly more positives, XGBoost consistently reduced false negatives, making it more effective for fraud detection where missing fraud is costlier than a false alarm.

Tools & Techniques

Languages & Libraries: Python, Pandas, NumPy, scikit-learn, XGBoost, Random Forest, Matplotlib/Seaborn
Techniques: Data preprocessing, feature engineering, model tuning, precision-recall optimization
Evaluation: Confusion matrix, precision, recall, F1-score

Tags: Data Science