Problem
Credit card fraud is a major challenge for banks and financial institutions. Detecting fraudulent transactions in real-time is difficult because the dataset is highly imbalanced—fraud cases are extremely rare compared to legitimate transactions.
Approach
Results
The final model successfully reduced false positives while maintaining high recall, ensuring more fraudulent cases were detected without overwhelming the system with false alerts. The performance was evaluated using confusion matrix, precision, recall, and F1-score, demonstrating the model’s ability to handle imbalanced data effectively.
Random Forest: Precision 94.8%, Recall 74.5%, F1 ≈ 0.835
XGBoost: Precision 96.7%, Recall 79.7%, F1 ≈ 0.873
While Random Forest caught slightly more positives, XGBoost consistently reduced false negatives, making it more effective for fraud detection where missing fraud is costlier than a false alarm.
Tools & Techniques
Languages & Libraries: Python, Pandas, NumPy, scikit-learn, XGBoost, Random Forest, Matplotlib/Seaborn
Techniques: Data preprocessing, feature engineering, model tuning, precision-recall optimization
Evaluation: Confusion matrix, precision, recall, F1-score