Custom Image Creation
This documentation outlines the setup of a custom Docker image creation service. The image is designed to be built on a linux/amd64
architecture. The service is exposed on port 8080 and includes specific endpoints for various types of models, including classical machine learning models, large language models (LLMs) for text generation, and others.
Service Overview
The custom image creation service is configured to listen on port 8080 and provides endpoints for handling different types of model requests. The service supports a wide range of models to cater to diverse machine learning tasks.
Endpoints
Health Check
Model Health Check Endpoint
- Endpoint:
/health-check
- Method: GET
- Description: Checks the health and readiness of all models to ensure they are loaded and functioning correctly.
- Response: JSON indicating the health status of each model with status code 200 if the model is healthy and 504 if the model is not healthy.
Inference Endpoint
All endpoints exposed within the Docker image are accessible as inference endpoints from the inference gateway.
Note: The custom endpoint does not have a dedicated UI tab
Example of Custom Endpoint Call
Below is an example of a Python script to perform inference using the custom endpoint:
import requests
base_url = "https://inference.develop.openinnovation.ai/models/<model_version_id>/proxy"
api_key = "api_key"
inference_endpoint = f"{base_url}/custom-endpoint"
payload = {}
headers = {"Authorization": f"Bearer {api_key}", "Content-Type": "application/json"}
response = requests.post(inference_endpoint, json=payload, headers=headers)
Example Implementation
Below is a simple example of a classical machine learning model using FastAPI and a linear regression model for diabetes prediction:
app.py
from fastapi import FastAPI
from pydantic import BaseModel
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes
import numpy as np
app = FastAPI()
# Load the diabetes dataset
db = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(
db.data, db.target, test_size=0.2, random_state=42
)
# Create and train the linear regression model
model = LinearRegression()
model.fit(X_train, y_train)
class InputData(BaseModel):
input: list
@app.post("/v1/models/model:predict")
async def predict(data: InputData):
input_data = np.array(data.input).reshape(-1, X_train.shape[1])
prediction = model.predict(input_data)
result = {"prediction": prediction.tolist()}
return result
@app.get("/health-check")
async def health_check():
return {"status": "healthy", "message": "Model is ready and healthy"}
Docker Image
To run the FastAPI application in a Docker container, use the following Dockerfile:
Dockerfile
Note: All docker images can't run as root user, so we need to run the app as a non-root user (runner) with PID 10000
# Use an official Python runtime as a parent image
FROM python:3.9-slim
# create a non-root user
RUN useradd --uid 10000 runner
RUN mkdir /app && chown runner /app
USER runner
# Set the working directory
WORKDIR /app
# Copy the current directory contents into the container at /app
COPY . /app
# Install dependencies specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
# Expose port 8080 for the application
EXPOSE 8080
# Run the FastAPI application with Uvicorn
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8080"]
Requirements
To ensure that the Python environment has all necessary dependencies, the requirements.txt
file should include: