1 minute read

📌 Project Overview

This project implements a complete machine learning pipeline to detect fraudulent transactions using the IEEE-CIS dataset. It includes:

  • Deep EDA in R and Python
  • Robust preprocessing and feature engineering
  • Ensemble modeling (XGBoost, LightGBM, CatBoost)
  • A FastAPI deployment for real-time predictions
  • Unit testing with pytest for pipeline robustness

🚀 Check the full repository: GitHub


🧠 Dataset and Preprocessing

The dataset contains transaction and identity features, most of them anonymized.

Key steps:

  • Merged identity and transaction data
  • Imputed missing values using statistical strategies
  • Encoded categorical variables using LabelEncoder
  • Scaled numerical features using StandardScaler
  • Saved transformers and models using joblib for reuse in deployment

🤖 Model Training and Evaluation

We trained and compared multiple models:

  • Logistic Regression
  • Random Forest
  • XGBoost
  • LightGBM
  • CatBoost
  • Stacking Ensemble with Logistic Regression as meta-learner

All models were evaluated using:

  • Accuracy, Precision, Recall, F1-score
  • ROC-AUC
  • Confusion matrices
  • ROC curves

📊 Results are available in the reports section.


🧪 Testing

We implemented tests to ensure pipeline integrity:

  • ✅ Preprocessing pipeline does not crash with valid data
  • ✅ API responds with expected output structure
  • ✅ Unit tests managed with pytest

🖥️ Deployment

The final model was deployed using FastAPI. Key features:

  • /predict endpoint returns prediction and fraud probability
  • Interactive Swagger UI available at /docs
  • Can be run locally via Uvicorn or deployed in a Docker container
uvicorn src.main:app --reload

Or with Docker

docker pull alexmatiasastorga/fraud-api:latest
docker run -d -p 8000:8000 alexmatiasastorga/fraud-api

📌 Conclusion

This project demonstrates a real-world machine learning workflow from raw data to deployment. Future improvements may include:

  • DAG automation with Apache Airflow
  • Cloud deployment (Render or AWS)
  • Monitoring with MLFlow or Prometheus

Updated: