Overview

The shrink_model.py script is a utility tool that compresses our trained Machine Learning model to reduce its file size. This is essential for deployment platforms like GitHub and Render, which have file size limits.

🎯 Purpose

Reduce the model file size from ~50 MB to ~25 MB (or less) without affecting the model's accuracy or functionality.

Complete Code

import joblib import os # الحل البرمجي الذكي (ضغط الملف) # سنستخدم كوداً بسيطاً لضغط الملف من حجم كبير إلى حجم صغير جداً (مثل ملف Zip) # لكن بصيغة يفهمها بايثون. # 1. تحميل الموديل الضخم الحالي print("⏳ جاري تحميل الموديل الضخم...") model = joblib.load('credit_risk_model.pkl') # 2. حفظ الموديل مع ضغط عالي (مستوى 5) print("📦 جاري ضغط وحفظ الموديل...") joblib.dump(model, 'credit_risk_model.pkl', compress=5) # 3. التأكد من الحجم الجديد size = os.path.getsize('credit_risk_model.pkl') / (1024 * 1024) print(f"✅ تم ضغط الموديل بنجاح! الحجم الجديد: {size:.2f} MB")

1️⃣ Library Imports

import joblib import os

What Each Library Does:

📦 joblib

A Python library for saving and loading Python objects efficiently. We use it to:

  • joblib.load(): Load the existing model file
  • joblib.dump(): Save the model with compression

💻 os

Operating system interface. We use it to:

  • os.path.getsize(): Get the file size in bytes
  • Convert bytes to megabytes for human-readable output

2️⃣ Loading the Original Model

print("⏳ جاري تحميل الموديل الضخم...") model = joblib.load('credit_risk_model.pkl')

What This Does:

  • Print Message: Shows a user-friendly message in Arabic: "Loading the large model..."
  • Load Model: Reads the trained model from the credit_risk_model.pkl file
  • The model is loaded into memory so we can re-save it with compression

💡 Why Load First?

We need to load the model into memory before we can save it with compression. Think of it like:

  • Without compression: Like saving a document without ZIP compression
  • With compression: Like saving a document as a ZIP file - same content, smaller size

3️⃣ Compressing and Saving the Model ⭐

print("📦 جاري ضغط وحفظ الموديل...") joblib.dump(model, 'credit_risk_model.pkl', compress=5)

🔑 The Key: compress=5 Parameter

The compress parameter tells joblib how much to compress the file:

Compress Level Description File Size
compress=0 No compression ~50 MB
compress=3 Medium compression ~30-35 MB
compress=5 High compression (Recommended) ~20-25 MB

How Compression Works:

Joblib compression uses algorithms (similar to ZIP compression) to:

  • Remove Redundancy: Eliminates repeated patterns in the data
  • Efficient Storage: Uses optimized formats to store the model
  • Preserve Functionality: All model "knowledge" remains intact

✅ Important: Compression does NOT affect model accuracy - it only reduces file size!

💡 Why compress=5?

  • Balance: Good compression ratio without being too slow
  • GitHub Limits: Files over 25 MB require Git LFS (Large File Storage)
  • Render Limits: Smaller files deploy faster
  • Storage: Saves disk space and bandwidth

4️⃣ Verifying the New File Size

# 3. التأكد من الحجم الجديد size = os.path.getsize('credit_risk_model.pkl') / (1024 * 1024) print(f"✅ تم ضغط الموديل بنجاح! الحجم الجديد: {size:.2f} MB")

Breaking Down the Calculation:

  1. os.path.getsize('credit_risk_model.pkl'): Gets the file size in bytes
  2. / (1024 * 1024): Converts bytes to megabytes (MB)
    • 1 KB = 1024 bytes
    • 1 MB = 1024 KB = 1,048,576 bytes
  3. f"{size:.2f} MB": Formats the number to 2 decimal places (e.g., "24.56 MB")

📊 Expected Output:

⏳ جاري تحميل الموديل الضخم... 📦 جاري ضغط وحفظ الموديل... ✅ تم ضغط الموديل بنجاح! الحجم الجديد: 24.56 MB

🎯 Why Compression Matters

1. GitHub File Size Limits

  • Files over 25 MB require Git LFS (Large File Storage)
  • Git LFS requires additional setup and costs
  • Compression helps avoid this complexity

2. Deployment Speed

  • Smaller files upload faster to deployment platforms
  • Faster deployments mean quicker updates
  • Better user experience

3. Storage Efficiency

  • Saves disk space on your computer
  • Reduces bandwidth when downloading
  • More professional project structure

4. Model Functionality

  • No Accuracy Loss: Compression doesn't change model predictions
  • Same Performance: Model works exactly the same
  • Transparent: joblib automatically decompresses when loading

🚀 How to Use This Script

Step-by-Step Instructions:

  1. Ensure the model file exists: Make sure credit_risk_model.pkl is in the same directory as shrink_model.py
  2. Run the script: Open terminal/command prompt and run:
    python shrink_model.py
  3. Wait for completion: The script will:
    • Load the model
    • Compress and save it
    • Display the new file size
  4. Verify: Check that the file size is now under 25 MB

⚠️ Important Notes:

  • The script overwrites the original file - make a backup if needed!
  • Compression takes a few seconds depending on file size
  • The compressed model works exactly the same as the original
  • You can compress multiple times - it's safe!

📊 Before vs After Compression

Metric Before After
File Size ~50 MB ~25 MB
Compression None (compress=0) High (compress=5)
Model Accuracy 100% 100%
GitHub Compatible ❌ No (too large) ✅ Yes
Deployment Speed Slow Fast

🔧 Technical Details

How joblib Compression Works:

Joblib uses the zlib compression algorithm (same as ZIP files) to compress the model data. The compression process:

  1. Analyzes the model data for patterns and redundancy
  2. Encodes repeated patterns efficiently
  3. Stores the compressed data in a format that joblib can decompress
  4. Preserves all model parameters, trees, and weights

💡 Automatic Decompression:

When you load a compressed model with joblib.load(), it automatically:

  • Detects that the file is compressed
  • Decompresses it in memory
  • Returns the full model (same as before compression)

You don't need to do anything special! Just use joblib.load() as usual - it handles everything automatically.

🎯 Key Takeaways

  • Simple Script: Only 16 lines of code, but very powerful
  • compress=5: Optimal compression level for deployment
  • No Accuracy Loss: Model works exactly the same after compression
  • Automatic: joblib handles compression/decompression transparently
  • Essential for Deployment: Required for GitHub and Render compatibility
  • Reversible: You can compress multiple times safely

📁 Related Files

This script is part of the model training and deployment workflow:

🔄 Complete Workflow:

  1. Train model in Jupyter Notebook → Save as credit_risk_model.pkl (~50 MB)
  2. Run shrink_model.py → Compress to ~25 MB
  3. Upload to GitHub → Deploy to Render
  4. Use in Flask app → Load with joblib.load()