Custom Image Creation

This documentation outlines the setup of a custom Docker image creation service. The image is designed to be built on a linux/amd64 architecture. The service is exposed on port 8080 and includes specific endpoints for various types of models, including classical machine learning models, large language models (LLMs) for text generation, and others.

Service Overview

The custom image creation service is configured to listen on port 8080 and provides endpoints for handling different types of model requests. The service supports a wide range of models to cater to diverse machine learning tasks.

Endpoints

Health Check

Model Health Check Endpoint

Endpoint: /health-check
Method: GET
Description: Checks the health and readiness of all models to ensure they are loaded and functioning correctly.
Response: JSON indicating the health status of each model with status code 200 if the model is healthy and 504 if the model is not healthy.

Inference Endpoint

All endpoints exposed within the Docker image are accessible as inference endpoints from the inference gateway.

Note: The custom endpoint does not have a dedicated UI tab

Example of Custom Endpoint Call

Below is an example of a Python script to perform inference using the custom endpoint:

import requests

base_url = "https://inference.develop.openinnovation.ai/models/<model_version_id>/proxy"
api_key = "api_key"
inference_endpoint = f"{base_url}/custom-endpoint"

payload = {}

headers = {"Authorization": f"Bearer {api_key}", "Content-Type": "application/json"}

response = requests.post(inference_endpoint, json=payload, headers=headers)

Example Implementation

Below is a simple example of a classical machine learning model using FastAPI and a linear regression model for diabetes prediction:

app.py

from fastapi import FastAPI
from pydantic import BaseModel
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes
import numpy as np

app = FastAPI()

# Load the diabetes dataset
db = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(
    db.data, db.target, test_size=0.2, random_state=42
)

# Create and train the linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

class InputData(BaseModel):
    input: list

@app.post("/v1/models/model:predict")
async def predict(data: InputData):
    input_data = np.array(data.input).reshape(-1, X_train.shape[1])
    prediction = model.predict(input_data)
    result = {"prediction": prediction.tolist()}
    return result

@app.get("/health-check")
async def health_check():
    return {"status": "healthy", "message": "Model is ready and healthy"}

Docker Image

To run the FastAPI application in a Docker container, use the following Dockerfile:

Dockerfile

Note: All docker images can't run as root user, so we need to run the app as a non-root user (runner) with PID 10000

# Use an official Python runtime as a parent image
FROM python:3.9-slim

# create a non-root user
RUN useradd --uid 10000 runner
RUN mkdir /app && chown runner /app

USER runner

# Set the working directory
WORKDIR /app

# Copy the current directory contents into the container at /app
COPY . /app

# Install dependencies specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt

# Expose port 8080 for the application
EXPOSE 8080

# Run the FastAPI application with Uvicorn
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8080"]

Requirements

To ensure that the Python environment has all necessary dependencies, the requirements.txt file should include:

fastapi
uvicorn
scikit-learn
numpy