Deploy Foundry Local as an Azure Arc extension

Applies to: Foundry Local on Azure Local

This article shows you how to set up Foundry Local as an extension on your Azure Kubernetes Service (AKS) cluster enabled by Azure Arc. Use the Azure CLI to deploy Foundry Local as an extension on your Azure Arc-enabled Kubernetes cluster. Helm is also a supported deployment option, and installation instructions are provided during preview access onboarding.

Important

Foundry Local is available in preview. Preview releases provide early access to features that are in active deployment.
Features, approaches, and processes can change or have limited capabilities before general availability (GA).

Prerequisites

Before you begin, make sure you have:

Access to Foundry Local preview: Foundry Local on Azure Local is available by request during preview. Submit an access request at aka.ms/FoundryLocalAzure_PreviewRequest. After approval, you'll receive guidance on next steps for deployment.
A Kubernetes cluster (version 1.29 or later) connected to Azure Arc. For more information, see Azure Arc–enabled Kubernetes.
Your Azure Arc-enabled Kubernetes cluster is located in a supported region. For available regions, see Supported regions.
An app registration for enablement of authorization and authentication. See Configure authentication for Foundry Local enabled by Azure Arc.
kubectl installed and configured for your cluster.
Helm installed.
For external endpoints: an NGINX ingress controller, such as NGINX-Ingress.
(Optional) A namespace strategy if you plan to deploy models outside the default foundry-local-operator namespace. Namespace configuration must be set during installation. For more information, see Namespace configuration for model deployments.

Important

Ingress-NGINX is deprecated since March 2026. Microsoft currently supports NGINX annotations. The solution is tested with AKS's managed NGINX ingress controller.

GPU prerequisites

If you plan to run GPU workloads, also make sure:

NVIDIA GPU nodes are available in your cluster with CUDA drivers installed on the nodes.
The Kubernetes device plugin for NVIDIA is configured so the cluster can schedule GPU workloads.

For more information, see NVIDIA GPU Operator.

Step 1: Install cert-manager and trust-manager

Foundry Local on Azure Local requires cert-manager and trust-manager for automated certificate management.

Use the Azure CLI to create the cert-manager extension on your cluster. Choose the appropriate command for your shell environment:

Bash
PowerShell

az k8s-extension create \
    --cluster-name <your_arc_cluster_name> \
    --name "azure-cert-manager" \
    --resource-group <resource_group_of_the_arc_cluster> \
    --cluster-type connectedClusters \
    --extension-type Microsoft.CertManagement \
    --scope cluster \
    --release-train stable \
    --config config.enableGatewayAPI=true \
    --config cert-manager.crds.keep=true \
    --config trust-manager.defaultPackage.enabled=false \
    --config trust-manager.secretTargets.enabled=true \
    --config trust-manager.secretTargets.authorizedSecretsAll=true

az k8s-extension create `
    --cluster-name <your_arc_cluster_name> `
    --name "azure-cert-manager" `
    --resource-group <resource_group_of_the_arc_cluster> `
    --cluster-type connectedClusters `
    --extension-type Microsoft.CertManagement `
    --scope cluster `
    --release-train stable `
    --config config.enableGatewayAPI=true `
    --config cert-manager.crds.keep=true `
    --config trust-manager.defaultPackage.enabled=false `
    --config trust-manager.secretTargets.enabled=true `
    --config trust-manager.secretTargets.authorizedSecretsAll=true

Step 2: Install the inference operator

Use the Azure CLI to deploy the inference operator extension. Choose the appropriate command for your shell environment:

Bash
PowerShell

az k8s-extension create \
    --resource-group <resource_group_of_the_arc_cluster> \
    --cluster-name <arc_cluster_name> \
    --name "inference-operator" \
    --extension-type Microsoft.Foundry \
    --scope cluster \
    --release-namespace "foundry-local-operator" \
    --cluster-type connectedClusters \
    --auto-upgrade-minor-version true \
    --release-train stable \
    --config entraAuth.tenantId="<azure_tenant_id>" \
    --config entraAuth.clientId="<the_client_id_of_the_app_registration>"

az k8s-extension create `
    --resource-group <resource_group_of_the_arc_cluster> `
    --cluster-name <arc_cluster_name> `
    --name "inference-operator" `
    --extension-type Microsoft.Foundry `
    --scope cluster `
    --release-namespace "foundry-local-operator" `
    --cluster-type connectedClusters `
    --auto-upgrade-minor-version true `
    --release-train stable `
    --config entraAuth.tenantId="<azure_tenant_id>" `
    --config entraAuth.clientId="<the_client_id_of_the_app_registration>"

Additional installation parameters

You can configure the following optional parameters during inference operator installation:

Parameter Description

entraAuth.enabled Boolean. When enabled, the Entra Auth SDK sidecar and msi-adapter sidecar are injected into inference pods for JWT validation and ARM RBAC authorization. When disabled, entraAuth.tenantId and entraAuth.clientId parameters are optional. Default: true. For more information, see Configure authentication for Foundry Local enabled by Azure Arc.

watch.namespaces Array of strings. Configure this parameter if you want the operator to manage resources across multiple namespaces. By default, the operator manages the foundry-local-operator namespace where models and inference workloads are deployed. Pass the installation command as: --config watch.namespaces[0]="NS1" --config watch.namespaces[1]="NS2". For more information, see Namespace configuration for model deployments.

Parameter	Description
`entraAuth.enabled`	Boolean. When enabled, the Entra Auth SDK sidecar and msi-adapter sidecar are injected into inference pods for JWT validation and ARM RBAC authorization. When disabled, `entraAuth.tenantId` and `entraAuth.clientId` parameters are optional. Default: `true`. For more information, see Configure authentication for Foundry Local enabled by Azure Arc.
`watch.namespaces`	Array of strings. Configure this parameter if you want the operator to manage resources across multiple namespaces. By default, the operator manages the `foundry-local-operator` namespace where models and inference workloads are deployed. Pass the installation command as: `--config watch.namespaces[0]="NS1" --config watch.namespaces[1]="NS2"`. For more information, see Namespace configuration for model deployments.

Step 3: Verify the operator

Verify that the inference operator extension is installed and that all pods are running. Use the following commands to check the operator status:

Bash
PowerShell

kubectl get pods -n foundry-local-operator
kubectl get crd | grep foundry

kubectl get pods -n foundry-local-operator
kubectl get crd | Select-String -Pattern "foundry"

Wait until all pods show a Running status before you proceed.

The following screenshots show an example of the expected output:

Screenshot of terminal output from kubectl get pods command showing five pods in the foundry-local-operator namespace with Running or Completed status.

Screenshot of terminal output from kubectl get crd command showing four Foundry Local custom resource definitions registered in the cluster.

Troubleshoot your deployment

Use the following commands to troubleshoot issues with your deployment.

Check ModelDeployment status and events:

kubectl describe mdep <name>

Check operator logs:

kubectl logs -f deployment/inference-operator -n foundry-local-operator

Check pod status:

kubectl get pods -l app.kubernetes.io/managed-by=inference-operator
kubectl describe pod <pod-name>
kubectl logs <pod-name>

List all resources created by a deployment:

kubectl get deploy,svc,ing -l foundry.azure.com/deployment=<name>

Check the catalog ConfigMap:

kubectl get configmap foundry-local-catalog -n foundry-local-operator -o yaml

Verify a Model CR exists:

kubectl get models
kubectl describe model <name>

Next step

Run your first model

Feedback

Was this page helpful?

Last updated on 2026-05-04