Tools Used

Building a Machine Learning web application requires various tools and technologies. Here's what we used for this project:

Programming & Development

  • Python: The main programming language. Python is perfect for Machine Learning because it has excellent libraries and is easy to read and write.
  • VS Code: Visual Studio Code is our code editor. It helps us write, debug, and organize our code with features like syntax highlighting and extensions.

Web Framework & Server

  • Flask: A lightweight Python web framework. Flask makes it easy to create web applications and APIs. It's simple to learn but powerful enough for production use.
  • Gunicorn: A production-ready web server for Python applications. While Flask has a built-in development server, Gunicorn is needed for deploying to production platforms like Render.

Machine Learning Libraries

  • Scikit-learn: One of the most popular Machine Learning libraries in Python. It provides our Random Forest Classifier, data preprocessing tools (imputers, scalers, encoders), and pipeline functionality.
  • Pandas: A powerful library for data manipulation and analysis. We use it to organize user input into DataFrames (like Excel spreadsheets) that our model can understand.
  • Joblib: A library for saving and loading Python objects efficiently. We use it to save our trained model and load it in the Flask app without retraining.
  • NumPy: A fundamental library for numerical computing. It's a dependency for scikit-learn and pandas, handling all the mathematical operations behind the scenes.

Deployment & Version Control

  • GitHub: A platform for storing and sharing code. We use it to version control our project, track changes, and collaborate. It also integrates with deployment platforms.
  • Render: A cloud platform for deploying web applications. Render automatically builds and deploys our Flask app from GitHub, making it accessible to users worldwide via a URL.

Technology Stack Summary

Python Flask Scikit-learn Pandas Joblib Gunicorn VS Code GitHub Render

Dataset

Our Smart Credit Risk System was trained on the Credit Risk Dataset from Kaggle. This dataset contains real-world loan application information and outcomes, making it perfect for training a credit risk prediction model.

Dataset Source

Platform: Kaggle
Dataset Name: Credit Risk Dataset
Type: Structured tabular data (CSV format)

Main Features in the Dataset

The dataset includes various features (columns) that describe loan applicants and their loan details:

Target Variable

The dataset also includes the loan_status column, which tells us whether each loan applicant actually defaulted (didn't pay back) or paid back successfully. This is what we're trying to predict for new applicants!

Why This Dataset?

This dataset is ideal for our project because:

📊 Data Preprocessing

Before training, the data goes through preprocessing steps:

  • Handling Missing Values: Some applications might have missing information, which we fill in
  • Scaling: Numerical features are normalized to the same scale
  • Encoding: Categorical features (like "Home Improvement") are converted to numbers
  • Feature Engineering: We calculate derived features like interest_burden

Project Architecture

Here's how all these tools work together:

  1. Data Collection: Download Credit Risk Dataset from Kaggle
  2. Model Training: Use Python, Scikit-learn, and Pandas to train Random Forest model
  3. Model Saving: Use Joblib to save the trained model
  4. Web Development: Use Flask to create the web application
  5. Frontend: Create HTML/CSS interface for user interaction
  6. Version Control: Store code on GitHub
  7. Deployment: Deploy to Render using Gunicorn
  8. Result: Live web application accessible to users!