MLOPs Zoomcamp (Week 4)

MLOPs Zoomcamp (Week 4)

Model Deployment

Introduction

After weeks long or even months long experimentation, we have finally received a model that shows significant performance. So now what ? What do we do with this model? Are we going to publish papers about our work and model, might even do that if our objective was to discover new SOTA models. But that's not the case here. We need to create value for businesses or people through this model. How do we create value through this model then ? We need to make the model available for the users who are going to benefit from it. This is the main point of the whole project, creating value.

We need to deploy the model either as a service or provide the output of the model for decision making, so the targeted end users of the model can benefit from it. The end users can be business analysts or executives who will be using the predictions output by the model, so that they can make better business decisions. The end users can be common people as well, for instance, people might interact with some app and the app gets predictions from the model. So there are lots of scenarios and use cases for our models.

The process of taking a trained model and making its predictions available to users and other systems is called deployment. We have reached a step in the machine learning lifecycle where we need to deploy the model for end users now. But before we dive into deployment we need to ask ourselves, whether the end users need the predictions right away or they can wait for a certain period of time. Let’s understand this through an example. For business analysts, executives, etc, they need predictions weekly, or may be monthly so that they can make decisions for the next month or week. In this case, the model runs at a certain interval of time, like weekly or monthly and the predictions are made available accordingly. This is referred to as batch deployment. For people interacting with applications, they need predictions right away. This is referred to as online deployment. Besides this, in online deployment the model service is listening for data or events all the time, and provides predictions continuously. The online deployment is further divided into two parts :-

  1. Web Service
  2. Streaming

In web service, the data from the application is sent to the backend and from there to the model and the predictions are received in return. Here the application acts as a producer, producing the data and the model acts as consumer, consuming the data. There is one-to-one connection between client and server in deployment as a web service. While in streaming, the data from the app is sent to the backend and this backend acts as a producer, where it passes the data to multiple consumers (or models, different models for different purposes) and the predictions are returned.

Web Service

A web service is either

  • a service offered by an electronic device to another electronic device, communicating with each other via the Internet, or
  • a server running on a computer device, listening for requests at a particular port over a network, serving web documents (HTML,JSON, images).

source: wikipedia

With that definition of web service, from wikipedia, it is quite clear the concept of deployment of a model as a web service. The trained model runs on a server, and we send the data with http requests to get the predictions as http response. The data is shared in JSON format as it is the most common format of data sharing over the web.

This week, we learned how to deploy a trained model as a web service with flask. Flask is a python framework for web development. Flask is relatively easy to use than Django, while Flask lacks some functionalities provided by Django. For our purpose, we won't be needing exceptional functionalities provided by Django compared to Flask, so we will stick with Flask for our model deployment. We already have a script for getting the predictions. We will modify this script a bit so that we can deploy the model by running the script. Have a look at the code below.

     from flask import Flask, request, jsonify

     def get_features(data):
           return features

     def predict(features):
           return prediction

     app = Flask("duration-prediction")

     @app.route("/predict", method=["POST"])
     def predict_endpoint():
          data = request.get_json()
          features = get_features(data)
          prediction = predict(features=features)
          result = {"prediction": float(prediction)}
          return jsonify(result)

    if __name__ == "__main__":
          app.run(debug=True, host=0.0.0.0, port=9696)

By decorating a function as an app we add functionality to this function and convert it into an HTTP endpoint and we can send HTTP request along with data to this endpoint and get predictions. The data would be sent in JSON format. When this script is run, we can send the HTTP request to the endpoint as shown below.

import requests

data = {"data": 20}

url = "http://127.0.0.1:9696/predict"

response = requests.post(url, json=data)
print(response.json())

Now if we run the above script, it will send POST request to the endpoint at address http://127.0.0.1:9696. In return, the prediction is received as response object. This response object contains prediction value in JSON format, which can be accessed using response.json method.

Web Service with mlflow

We have already discussed mlflow in week 2, so we will limit our concern to the bigger picture of getting models from mlflow for deployment. There are many models available in mlflow server, which can be directly accessed from the code. This makes our deployment faster as we can access the latest available models in the mlflow server and deploy it directly. We can directly load the model from the server with the following code.

import mlflow

RUN_ID = "9876r98r994r69rhyhfr93287r68"
MLFLOW_TRACKING_SERVER = "http://127.0.0.1:5000"

logged_model = f"runs:/{RUN_ID}/model"
model = mlflow.pyfunc.load_model(logged_model)

There is a problem while loading the model from the mlflow server. What happens if the mlflow server is down. We cannot connect to the server and models cannot be accessed. For such scenarios, we can load the model from an artifacts store instead of mlflow server, which most of the time is remote storage (S3 bucket) but can be local as well for learning purposes. The following code gives an idea of how the model can be accessed from a local artifacts directory. For this we need to initialize the mlflow server by passing the local directory path to -- default-artifact-root=path


import mlflow

RUN_ID = "9876r98r994r69rhyhfr93287r68"

logged_model = f"./artifacts-default/2/{RUN_ID}/artifacts/model"
model = mlflow.pyfunc.load_model(logged_model)

In this case, we don't need the mlflow server. We have our model in local directory, that we have setup by configuring the mlflow server during initiation. From above code snippet, it's clear that artifacts-default is the directory which contains artifacts from the mlflow server.

Batch Deployment

This is also referred to as offline deployment due the fact that the predictions are made available at certain times only and are either stored as files or in some kind of data lakes or data stores. There is no need for a model to be online, waiting and listening to incoming data and events. The deployment of the model is scheduled and the model outputs predictions at certain time intervals. This mode of deployment can be found frequently in the industry. In this mode of deployment, the data comes in at a fixed time interval, let’s say weekly or monthly, and when the new data is available the model inputs that data and outputs the predictions.

def get_preds(data):
    features = get_features(data)
    preds = get_predictions(features=features)
    preds.to_parquet(output_file, index=False)

In the above code snippet, we are storing the predictions in parquet format file.

That's all for the week 4. We learned how to deploy models and also integrate mlflow so that we can deploy models easily and frequently. If you are interested in MLOPs and want to explore more on production and operation of machine learning models and take your jupyter notebooks into production, then checkout the mlops-zoomcamp course @datatalksclub. Happy learning !