shrink_model.py Explanation - Smart Credit Risk System

Overview

The shrink_model.py script is a utility tool that compresses our trained Machine Learning model to reduce its file size. This is essential for deployment platforms like GitHub and Render, which have file size limits.

🎯 Purpose

Reduce the model file size from ~50 MB to ~25 MB (or less) without affecting the model's accuracy or functionality.

Complete Code

import joblib
import os

# الحل البرمجي الذكي (ضغط الملف)
# سنستخدم كوداً بسيطاً لضغط الملف من حجم كبير إلى حجم صغير جداً (مثل ملف Zip)
# لكن بصيغة يفهمها بايثون.

# 1. تحميل الموديل الضخم الحالي
print("⏳ جاري تحميل الموديل الضخم...")
model = joblib.load('credit_risk_model.pkl')

# 2. حفظ الموديل مع ضغط عالي (مستوى 5)
print("📦 جاري ضغط وحفظ الموديل...")
joblib.dump(model, 'credit_risk_model.pkl', compress=5)

# 3. التأكد من الحجم الجديد
size = os.path.getsize('credit_risk_model.pkl') / (1024 * 1024)
print(f"✅ تم ضغط الموديل بنجاح! الحجم الجديد: {size:.2f} MB")
            

1️⃣ Library Imports

import joblib
import os
            

What Each Library Does:

📦 joblib

A Python library for saving and loading Python objects efficiently. We use it to:

joblib.load(): Load the existing model file
joblib.dump(): Save the model with compression

💻 os

Operating system interface. We use it to:

os.path.getsize(): Get the file size in bytes
Convert bytes to megabytes for human-readable output

2️⃣ Loading the Original Model

print("⏳ جاري تحميل الموديل الضخم...")
model = joblib.load('credit_risk_model.pkl')
            

What This Does:

Print Message: Shows a user-friendly message in Arabic: "Loading the large model..."
Load Model: Reads the trained model from the credit_risk_model.pkl file
The model is loaded into memory so we can re-save it with compression

💡 Why Load First?

We need to load the model into memory before we can save it with compression. Think of it like:

Without compression: Like saving a document without ZIP compression
With compression: Like saving a document as a ZIP file - same content, smaller size

3️⃣ Compressing and Saving the Model ⭐

print("📦 جاري ضغط وحفظ الموديل...")
joblib.dump(model, 'credit_risk_model.pkl', compress=5)
            

🔑 The Key: compress=5 Parameter

The compress parameter tells joblib how much to compress the file:

Compress Level	Description	File Size
`compress=0`	No compression	~50 MB
`compress=3`	Medium compression	~30-35 MB
`compress=5`	High compression (Recommended)	~20-25 MB

How Compression Works:

Joblib compression uses algorithms (similar to ZIP compression) to:

Remove Redundancy: Eliminates repeated patterns in the data
Efficient Storage: Uses optimized formats to store the model
Preserve Functionality: All model "knowledge" remains intact

✅ Important: Compression does NOT affect model accuracy - it only reduces file size!

                💡 Why compress=5?
                Balance: Good compression ratio without being too slow
GitHub Limits: Files over 25 MB require Git LFS (Large File Storage)
Render Limits: Smaller files deploy faster
Storage: Saves disk space and bandwidth

            

4️⃣ Verifying the New File Size

# 3. التأكد من الحجم الجديد
size = os.path.getsize('credit_risk_model.pkl') / (1024 * 1024)
print(f"✅ تم ضغط الموديل بنجاح! الحجم الجديد: {size:.2f} MB")
            

Breaking Down the Calculation:

os.path.getsize('credit_risk_model.pkl'): Gets the file size in bytes
/ (1024 * 1024): Converts bytes to megabytes (MB)
- 1 KB = 1024 bytes
- 1 MB = 1024 KB = 1,048,576 bytes
f"{size:.2f} MB": Formats the number to 2 decimal places (e.g., "24.56 MB")

📊 Expected Output:
⏳ جاري تحميل الموديل الضخم...
📦 جاري ضغط وحفظ الموديل...
✅ تم ضغط الموديل بنجاح! الحجم الجديد: 24.56 MB
                

🎯 Why Compression Matters

1. GitHub File Size Limits

Files over 25 MB require Git LFS (Large File Storage)
Git LFS requires additional setup and costs
Compression helps avoid this complexity

2. Deployment Speed

Smaller files upload faster to deployment platforms
Faster deployments mean quicker updates
Better user experience

3. Storage Efficiency

Saves disk space on your computer
Reduces bandwidth when downloading
More professional project structure

4. Model Functionality

No Accuracy Loss: Compression doesn't change model predictions
Same Performance: Model works exactly the same
Transparent: joblib automatically decompresses when loading

🚀 How to Use This Script

                Step-by-Step Instructions:
                
                        Ensure the model file exists: Make sure credit_risk_model.pkl 
                        is in the same directory as shrink_model.py
                    
Run the script: Open terminal/command prompt and run:
                        
python shrink_model.py
                        
                        Wait for completion: The script will:
                        Load the model
Compress and save it
Display the new file size

                        Verify: Check that the file size is now under 25 MB

⚠️ Important Notes:

The script overwrites the original file - make a backup if needed!
Compression takes a few seconds depending on file size
The compressed model works exactly the same as the original
You can compress multiple times - it's safe!

📊 Before vs After Compression

Metric	Before	After
File Size	~50 MB	~25 MB
Compression	None (compress=0)	High (compress=5)
Model Accuracy	100%	100%
GitHub Compatible	❌ No (too large)	✅ Yes
Deployment Speed	Slow	Fast

🔧 Technical Details

How joblib Compression Works:

Joblib uses the zlib compression algorithm (same as ZIP files) to compress the model data. The compression process:

Analyzes the model data for patterns and redundancy
Encodes repeated patterns efficiently
Stores the compressed data in a format that joblib can decompress
Preserves all model parameters, trees, and weights

💡 Automatic Decompression:

When you load a compressed model with joblib.load(), it automatically:

Detects that the file is compressed
Decompresses it in memory
Returns the full model (same as before compression)

You don't need to do anything special! Just use joblib.load() as usual - it handles everything automatically.

🎯 Key Takeaways

Simple Script: Only 16 lines of code, but very powerful
compress=5: Optimal compression level for deployment
No Accuracy Loss: Model works exactly the same after compression
Automatic: joblib handles compression/decompression transparently
Essential for Deployment: Required for GitHub and Render compatibility
Reversible: You can compress multiple times safely

📁 Related Files

This script is part of the model training and deployment workflow:

Credit_Risk_Dataset.ipynb: Trains the model and saves it initially
shrink_model.py: Compresses the model (this file)
app.py: Loads and uses the compressed model

                🔄 Complete Workflow:
                Train model in Jupyter Notebook → Save as credit_risk_model.pkl (~50 MB)
Run shrink_model.py → Compress to ~25 MB
Upload to GitHub → Deploy to Render
Use in Flask app → Load with joblib.load()

            

shrink_model.py - Model Compression Script