|

| How to Integrate Hugging Face with Kubernetes

How to Integrate Hugging Face with Kubernetes

January 24, 2025

Learn to seamlessly deploy Hugging Face models using Kubernetes in this step-by-step integration guide, perfect for enhancing your AI applications.

How to Connect Hugging Face to Kubernetes: a Simple Guide

Set Up Your Environment

Ensure you have a Kubernetes cluster running. You can use Minikube for local development or any cloud provider's Kubernetes service for deployment.

Install the necessary CLI tools: kubectl for managing Kubernetes clusters and the Hugging Face CLI for interacting with Hugging Face services.

# Install kubectl
brew install kubectl

# Install Hugging Face CLI
pip install huggingface_hub

Create a Hugging Face Model API

Sign up for a Hugging Face account and navigate to the Hub. Select a model you want to run.

Create an API token from your Hugging Face account settings and save it securely. You'll need it for accessing the Hugging Face model from your Kubernetes deployment.

Containerize Your Application

Create a Dockerfile for your application. This file should contain instructions to set up the environment and run your application.

Ensure to install necessary libraries and copy your codebase into the Docker image.

FROM python:3.8-slim

WORKDIR /app

COPY . /app

RUN pip install -r requirements.txt

ENTRYPOINT ["python", "app.py"]

Push Docker Image to a Registry

Build the Docker image and tag it appropriately.

Push the Docker image to a container registry like Docker Hub or Google Container Registry.

# Build the Docker image
docker build -t username/hf-kubernetes-example:latest .

# Push the Docker image 
docker push username/hf-kubernetes-example:latest

Deploying to Kubernetes

Create a Kubernetes deployment manifest. This YAML file should include specifications for your app's deployment, such as image location and replication settings.

Apply the deployment manifest to your Kubernetes cluster.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hf-kubernetes-example
spec:
  replicas: 2
  selector:
    matchLabels:
      app: hf-kubernetes-example
  template:
    metadata:
      labels:
        app: hf-kubernetes-example
    spec:
      containers:
      - name: hf-kubernetes-example
        image: username/hf-kubernetes-example:latest
        ports:
        - containerPort: 80
        env:
        - name: HF_API_TOKEN
          valueFrom:
            secretKeyRef:
              name: hf-secret
              key: api-token

Expose Your Application

Create a Service to expose your application. Use a LoadBalancer if you want external access.

Apply the Service manifest to your Kubernetes cluster.

apiVersion: v1
kind: Service
metadata:
  name: hf-kubernetes-service
spec:
  type: LoadBalancer
  ports:
  - port: 80
  selector:
    app: hf-kubernetes-example

Secure Your Deployment

Create secrets for sensitive information like API tokens using Kubernetes secrets.

Ensure your application can retrieve these secrets and use them safely.

kubectl create secret generic hf-secret --from-literal=api-token=your_hf_api_token_here

Monitor and Troubleshoot

Use `kubectl get pods` to check the status of your pods and ensure they're running.

Use `kubectl logs pod_name` to view logs and troubleshoot any issues in your deployment.

# Check the status of all running pods
kubectl get pods

# View detailed logs of a specific pod
kubectl logs <pod_name>

Omi Necklace

The #1 Open Source AI necklace: Experiment with how you capture and manage conversations.

Build and test with your own Omi Dev Kit 2.

How to Use Hugging Face with Kubernetes: Usecases

Deploying Large-Scale NLP Models with Hugging Face and Kubernetes

Leverage the power of Kubernetes to seamlessly manage the deployment of large-scale NLP models trained using Hugging Face's Transformers library.
Ensure dynamic scaling of infrastructure based on computational needs, allowing for cost-effective and efficient model inference.

Setting Up the Environment

Create a Kubernetes cluster using a cloud provider such as Google Kubernetes Engine (GKE), Amazon Elastic Kubernetes Service (EKS), or Azure Kubernetes Service (AKS).
Install necessary command-line tools: Kubernetes CLI (kubectl) and Helm for managing your cluster and its applications.

Model Serving with Hugging Face

Use Hugging Face's `transformers` library to prepare and export the NLP model. Hugging Face provides pre-trained models that can be fine-tuned or used out-of-the-box for various NLP tasks.

from transformers import pipeline
model = pipeline('sentiment-analysis')

Package the model into a Docker container, equipped with all dependencies required for execution. This ensures portability and compatibility across diverse environments.

FROM python:3.8-slim
COPY . /app
WORKDIR /app
RUN pip install transformers
CMD ["python", "serve.py"]

Deploying on Kubernetes

Define a Kubernetes Deployment configuration to manage model replicas and ensure high availability of your service.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nlp-model
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nlp-model
  template:
    metadata:
      labels:
        app: nlp-model
    spec:
      containers:
      - name: nlp-model
        image: your-docker-repo/nlp-model:latest
        ports:
        - containerPort: 80

Use a Kubernetes Service to expose your deployed model to external traffic, enabling application access.

Monitoring and Autoscaling

Integrate Kubernetes Horizontal Pod Autoscaler to automatically adjust the number of replicas in response to traffic loads, ensuring performance under varying workload conditions.
Set up monitoring tools like Prometheus and Grafana for real-time insights on resource utilization and application health.

Integrating Hugging Face Model Endpoints with Kubernetes for Real-Time Inference

Utilize Kubernetes to orchestrate Hugging Face's model inference endpoints for real-time data processing and analytics applications.
Achieve high availability and redundancy for NLP services by deploying multiple instances across a Kubernetes cluster.

Environment Configuration

Set up a Kubernetes cluster using providers like GKE, EKS, or AKS for robust infrastructure management.
Install `kubectl`, Helm, and the Hugging Face CLI to interact with Kubernetes and Hugging Face's services.

Creating and Exporting Models

Utilize the Hugging Face `transformers` library to train and finalize the model for deployment purposes.

from transformers import BertTokenizer, BertForSequenceClassification
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')

Ensure the model is containerized using Docker for seamless deployment in a Kubernetes environment.

FROM python:3.8
COPY . /app
WORKDIR /app
RUN pip install transformers
CMD ["python", "inference.py"]

Deploying Hugging Face Models on Kubernetes

Use Kubernetes Deployments to manage load balancing and rolling updates for the Hugging Face model endpoint.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hf-model
spec:
  replicas: 5
  selector:
    matchLabels:
      app: hf-model
  template:
    metadata:
      labels:
        app: hf-model
    spec:
      containers:
      - name: hf-model
        image: your-docker-repo/hf-model:latest
        ports:
        - containerPort: 8080

Configure a Kubernetes Service to make the Hugging Face model endpoint accessible to external applications.

Ensuring Scalability and Resilience

Implement Horizontal Pod Autoscaler in Kubernetes to dynamically scale the model serving pods based on CPU and memory utilization.
Deploy Prometheus and Grafana for continuous monitoring of system performance, resource utilization, and application metrics.

Omi App

Fully Open-Source AI wearable app: build and use reminders, meeting summaries, task suggestions and more. All in one simple app.

Github →

Order Friend Dev Kit

Open-source AI wearable
Build using the power of recall

Order Now

Troubleshooting Hugging Face and Kubernetes Integration

How to deploy a Hugging Face model on Kubernetes?

Set Up Your Environment

Install the Kubernetes command-line tool `kubectl` and set up a Kubernetes cluster.

Authenticate your environment to ensure that `kubectl` can interact with your desired cluster.

Prepare the Docker Image

Create a Dockerfile for your Hugging Face model. Base it on an image, e.g., `python:3.9`, installing necessary libraries.

Create an entry-point script to run your model inference API, e.g., using Flask or FastAPI.

FROM python:3.9  
COPY . /app  
WORKDIR /app  
RUN pip install -r requirements.txt  
CMD ["python", "app.py"]

Deploy to Kubernetes

Build and push the Docker image: \`\`\`shell docker build -t . docker push \`\`\`

Create a Kubernetes deployment YAML file for your model. Configure replicas, container specs, and expose necessary ports.

Deploy using `kubectl apply`:

apiVersion: apps/v1  
kind: Deployment  
metadata:  
  name: huggingface-model  
spec:  
  replicas: 2  
  template:  
    metadata:  
      labels:  
        app: huggingface  
    spec:  
      containers:  
      - name: model-container  
        image: <your-docker-repo-url>  
        ports:  
        - containerPort: 5000

kubectl apply -f deployment.yaml

What are common errors when scaling Hugging Face models in Kubernetes?

Common Errors When Scaling Hugging Face Models in Kubernetes

Resource Allocation Issues: Ensure proper CPU and memory requests/limits are set. Default resource settings may lead to under-provision or overprovision, affecting performance or cost.

Concurrency Management Problems: Use Kubernetes Horizontal Pod Autoscaler for managing loads. Incorrect configuration can limit scaling benefits.

Load Balancing Challenges: Implement efficient load balancing with services like Istio. Default settings might not distribute traffic evenly, leading to bottlenecks.

Persistence Misconfigurations: Avoid using local storage for model artifacts. Use shared storage like persistent volume claims (PVCs) to ensure data availability across pods.

Networking Constraints: Check network configurations like Service Mesh, as they can impact latency and performance.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-model
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: huggingface-model
        image: huggingface/transformers
        resources:
          requests:
            memory: "4Gi"
            cpu: "2"
          limits:
            memory: "8Gi"
            cpu: "4"

How to optimize Hugging Face inference on Kubernetes for cost and speed?

Consider Efficient Deployment

Use lightweight base images and multi-stage builds to reduce container size. This optimizes speed and cost in Kubernetes.

Autoscale pods based on CPU, memory, and custom metrics using the HorizontalPodAutoscaler to manage loads efficiently.

Optimize Model Loading

Load models asynchronously to speed up initialization time. Use libraries that support lazy loading to minimize startup latency.

Cache models in memory with shared volumes to avoid repeated loading, thus saving time and computational resources.

Use Efficient Hardware

Utilize GPU instances for high-throughput models. Leverage node selectors and tolerations for appropriate hardware allocation.

Implement inference acceleration technologies like NVIDIA Triton Inference Server for supported models to optimize speed.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: huggingface-inference
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: model-server
        image: huggingface/transformers
        resources:
          limits:
            nvidia.com/gpu: 1

Don’t let questions slow you down—experience true productivity with the AI Necklace. With Omi, you can have the power of AI wherever you go—summarize ideas, get reminders, and prep for your next project effortlessly.

Order Now

Join the #1 open-source AI wearable community

Build faster and better with 3900+ community members on Omi Discord

Participate in hackathons to expand the Omi platform and win prizes

Get cash bounties, free Omi devices and priority access by taking part in community activities

Join our Discord →

OMI NECKLACE + OMI APP
First & only open-source AI wearable platform

a person looks into the phone with an app for AI Necklace, looking at notes Friend AI Wearable recorded

Task summarization

Effortlessly identify to-do items from everything that's been discussed

online meeting with AI Wearable, showcasing how it works and helps

Live voice and audio
transcription

Explore Omi app marketplace for countless ways to get actionable insights from it

App for Friend AI Necklace, showing notes and topics AI Necklace recorded

Simple all-in-one app

Recall and act upon what matters. Designed with privacy
in mind.

OMI NECKLACE: DEV KIT
Order your Omi Dev Kit 2 now and create your use cases

Omi Dev Kit 2

Endless customization

OMI DEV KIT 2

$69.99

Make your life more fun with your AI wearable clone. It gives you thoughts, personalized feedback and becomes your second brain to discuss your thoughts and feelings. Available on iOS and Android.

Your Omi will seamlessly sync with your existing omi persona, giving you a full clone of yourself – with limitless potential for use cases:

Real-time conversation transcription and processing;
Develop your own use cases for fun and productivity;
Hundreds of community apps to make use of your Omi Persona and conversations.

Learn more

Omi Dev Kit 2: build at a new level

Key Specs

OMI DEV KIT

OMI DEV KIT 2

Microphone

Yes

Battery

4 days (250mAH)

2 days (250mAH)

On-board memory (works without phone)

No

Yes

Speaker

No

Yes

Programmable button

No

Yes

Estimated Delivery

-

1 week

What people say

“Helping with MEMORY,

COMMUNICATION

with business/life partner,

capturing IDEAS, and solving for

a hearing CHALLENGE."

Nathan Sudds

“I wish I had this device

last summer

to RECORD

A CONVERSATION."

Chris Y.

“Fixed my ADHD and

helped me stay

organized."

David Nigh

OMI NECKLACE: DEV KIT
Take your brain to the next level

LATEST NEWS
Follow and be first in the know

Tweets by kodjima33

Latest news
FOLLOW AND BE FIRST IN THE KNOW

Tweets by kodjima33

thought to action

team@basedhardware.com

company

careers

events

invest

privacy

products

omi

omi dev kit

personas

resources

apps

bounties

affiliate

docs

github

help

How to Integrate Hugging Face with Kubernetes

How to Connect Hugging Face to Kubernetes: a Simple Guide

Omi Necklace

The #1 Open Source AI necklace: Experiment with how you capture and manage conversations.

Build and test with your own Omi Dev Kit 2.

How to Use Hugging Face with Kubernetes: Usecases

Omi App

Fully Open-Source AI wearable app: build and use reminders, meeting summaries, task suggestions and more. All in one simple app.

Troubleshooting Hugging Face and Kubernetes Integration

Join the #1 open-source AI wearable community

Build faster and better with 3900+ community members on Omi Discord

Participate in hackathons to expand the Omi platform and win prizes

Participate in hackathons to expand the Omi platform and win prizes

Get cash bounties, free Omi devices and priority access by taking part in community activities

OMI NECKLACE + OMI APPFirst & only open-source AI wearable platform

OMI NECKLACE: DEV KITOrder your Omi Dev Kit 2 now and create your use cases

Omi Dev Kit 2

OMI DEV KIT 2

Omi Dev Kit 2: build at a new level

Key Specs

What people say

OMI NECKLACE: DEV KITTake your brain to the next level

LATEST NEWSFollow and be first in the know

Latest newsFOLLOW AND BE FIRST IN THE KNOW

OMI NECKLACE + OMI APP
First & only open-source AI wearable platform

OMI NECKLACE: DEV KIT
Order your Omi Dev Kit 2 now and create your use cases

OMI NECKLACE: DEV KIT
Take your brain to the next level

LATEST NEWS
Follow and be first in the know

Latest news
FOLLOW AND BE FIRST IN THE KNOW