Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This article shows you how to set up Foundry Local as an extension on your Azure Kubernetes Service (AKS) cluster enabled by Azure Arc. Use the Azure CLI to deploy Foundry Local as an extension on your Azure Arc-enabled Kubernetes cluster. Helm is also a supported deployment option, and installation instructions are provided during preview access onboarding.
Important
- Foundry Local is available in preview. Preview releases provide early access to features that are in active deployment.
- Features, approaches, and processes can change or have limited capabilities before general availability (GA).
Prerequisites
Before you begin, make sure you have:
- Access to Foundry Local preview: Foundry Local on Azure Local is available by request during preview. Submit an access request at aka.ms/FoundryLocalAzure_PreviewRequest. After approval, you'll receive guidance on next steps for deployment.
- A Kubernetes cluster (version 1.29 or later) connected to Azure Arc. For more information, see Azure Arc–enabled Kubernetes.
- Your Azure Arc-enabled Kubernetes cluster is located in a supported region. For available regions, see Supported regions.
- An app registration for enablement of authorization and authentication. See Configure authentication for Foundry Local enabled by Azure Arc.
- kubectl installed and configured for your cluster.
- Helm installed.
- For external endpoints: an NGINX ingress controller, such as NGINX-Ingress.
- (Optional) A namespace strategy if you plan to deploy models outside the default
foundry-local-operatornamespace. Namespace configuration must be set during installation. For more information, see Namespace configuration for model deployments.
Important
Ingress-NGINX is deprecated since March 2026. Microsoft currently supports NGINX annotations. The solution is tested with AKS's managed NGINX ingress controller.
GPU prerequisites
If you plan to run GPU workloads, also make sure:
- NVIDIA GPU nodes are available in your cluster with CUDA drivers installed on the nodes.
- The Kubernetes device plugin for NVIDIA is configured so the cluster can schedule GPU workloads.
For more information, see NVIDIA GPU Operator.
Step 1: Install cert-manager and trust-manager
Foundry Local on Azure Local requires cert-manager and trust-manager for automated certificate management.
Use the Azure CLI to create the cert-manager extension on your cluster. Choose the appropriate command for your shell environment:
az k8s-extension create \
--cluster-name <your_arc_cluster_name> \
--name "azure-cert-manager" \
--resource-group <resource_group_of_the_arc_cluster> \
--cluster-type connectedClusters \
--extension-type Microsoft.CertManagement \
--scope cluster \
--release-train stable \
--config config.enableGatewayAPI=true \
--config cert-manager.crds.keep=true \
--config trust-manager.defaultPackage.enabled=false \
--config trust-manager.secretTargets.enabled=true \
--config trust-manager.secretTargets.authorizedSecretsAll=true
Step 2: Install the inference operator
Use the Azure CLI to deploy the inference operator extension. Choose the appropriate command for your shell environment:
az k8s-extension create \
--resource-group <resource_group_of_the_arc_cluster> \
--cluster-name <arc_cluster_name> \
--name "inference-operator" \
--extension-type Microsoft.Foundry \
--scope cluster \
--release-namespace "foundry-local-operator" \
--cluster-type connectedClusters \
--auto-upgrade-minor-version true \
--release-train stable \
--config entraAuth.tenantId="<azure_tenant_id>" \
--config entraAuth.clientId="<the_client_id_of_the_app_registration>"
Additional installation parameters
You can configure the following optional parameters during inference operator installation:
| Parameter | Description |
|---|---|
entraAuth.enabled |
Boolean. When enabled, the Entra Auth SDK sidecar and msi-adapter sidecar are injected into inference pods for JWT validation and ARM RBAC authorization. When disabled, entraAuth.tenantId and entraAuth.clientId parameters are optional. Default: true. For more information, see Configure authentication for Foundry Local enabled by Azure Arc. |
watch.namespaces |
Array of strings. Configure this parameter if you want the operator to manage resources across multiple namespaces. By default, the operator manages the foundry-local-operator namespace where models and inference workloads are deployed. Pass the installation command as: --config watch.namespaces[0]="NS1" --config watch.namespaces[1]="NS2". For more information, see Namespace configuration for model deployments. |
Step 3: Verify the operator
Verify that the inference operator extension is installed and that all pods are running. Use the following commands to check the operator status:
kubectl get pods -n foundry-local-operator
kubectl get crd | grep foundry
Wait until all pods show a Running status before you proceed.
The following screenshots show an example of the expected output:
Troubleshoot your deployment
Use the following commands to troubleshoot issues with your deployment.
Check ModelDeployment status and events:
kubectl describe mdep <name>
Check operator logs:
kubectl logs -f deployment/inference-operator -n foundry-local-operator
Check pod status:
kubectl get pods -l app.kubernetes.io/managed-by=inference-operator
kubectl describe pod <pod-name>
kubectl logs <pod-name>
List all resources created by a deployment:
kubectl get deploy,svc,ing -l foundry.azure.com/deployment=<name>
Check the catalog ConfigMap:
kubectl get configmap foundry-local-catalog -n foundry-local-operator -o yaml
Verify a Model CR exists:
kubectl get models
kubectl describe model <name>