An Azure service that is used to provision Windows and Linux virtual machines.
The behavior matches an SMB/TCP connectivity problem on the path between the on‑prem file servers and the Azure Windows Server 2022 VMs, not an issue with Azure VM storage or SMB server functionality.
Key points from the scenario and context:
- SMB transfers only fail when the Azure VM acts as SMB client over ExpressRoute to on‑prem shares and the payload is large (>15 GB). The same VM can receive an 80‑GB file from on‑prem when acting as server, which validates that the VM’s storage and SMB server stack are healthy.
- WireShark shows a TCP duplicate ACK storm followed by a connection reset at ~19% completion, which is consistent with packet loss or congestion on the network path.
- The SMB client logs explicitly mention that frequent disconnects can be caused by misconfigured RDMA over Converged Ethernet (RoCE) networks, especially missing Priority Flow Control (PFC), which leads to packet loss and poor performance.
- The context notes that SMB performance is closely tied to network performance and that packet loss will trigger TCP congestion control and throttling, which can severely impact large transfers.
Given this, the most likely root cause is a network‑side misconfiguration or limitation on the ExpressRoute/on‑prem side, particularly around RDMA/RoCE or network offload features, rather than a defect in Windows Server 2022 SMB.
Recommended actions (focused on the network path and SMB/TCP stack):
- Validate RDMA/RoCE configuration end‑to‑end
- Confirm whether the on‑prem path to ExpressRoute uses RDMA over Converged Ethernet (RoCE) NICs or switches.
- If RoCE is in use, ensure Priority Flow Control (PFC) is configured consistently on all hosts, switches, and routers on that path, as the SMB log guidance states that missing PFC causes packet loss, frequent disconnects, and poor performance.
- If PFC cannot be guaranteed end‑to‑end, consider disabling RDMA for this path and using standard TCP/Ethernet instead.
- Check for packet loss and MTU issues on ExpressRoute
- Use network monitoring on both on‑prem and Azure sides to look for drops, errors, or retransmits on the interfaces carrying the SMB traffic.
- Verify that MTU settings are consistent along the path; mismatched MTU can cause fragmentation and loss, which will manifest as duplicate ACK storms and resets.
- Ensure SMB network offload features are correctly configured
- The guidance for slow SMB transfers emphasizes that network offloading technologies such as Receive Side Scaling (RSS), Large Send Offload (LSO), Receive Segment Coalescing (RSC), and checksum offloads should remain enabled because they improve throughput and reduce CPU usage.
- Re‑enable any offloads that were disabled for testing (for example, LSO and RSC) and instead focus on fixing the underlying packet‑loss issue.
- Verify SMB client configuration for large transfers
- On the Azure Windows Server 2022 VMs, ensure that bandwidth throttling is disabled and large MTU is enabled for SMB:
Set-SmbClientConfiguration -EnableBandwidthThrottling 0 -EnableLargeMtu 1 - Confirm that SMB Multichannel is enabled and functioning, as it can improve resilience and performance when multiple NICs or RSS/RDMA are available.
- On the Azure Windows Server 2022 VMs, ensure that bandwidth throttling is disabled and large MTU is enabled for SMB:
- Check for security features impacting RDMA
- If SMB Signing or SMB Encryption is enabled on the path, be aware that, according to the context, SMB stops using RDMA direct data placement and falls back to a less efficient send/receive mode with smaller MTU, which can reduce performance and increase fragmentation.
- Evaluate whether signing/encryption is required for this specific path; if they are mandatory, plan for the reduced performance and ensure the network can handle the additional overhead without loss.
- Isolate by testing alternative paths
- Since the issue reproduces across subscriptions but only for SMB client traffic over ExpressRoute, test:
- A smaller test file that just exceeds the failure threshold (for example, 20 GB) to confirm the size dependency.
- A different network path (VPN or another ExpressRoute circuit) if available, to see whether the problem is specific to a particular circuit or on‑prem network segment.
- Since the issue reproduces across subscriptions but only for SMB client traffic over ExpressRoute, test:
By focusing on eliminating packet loss and misconfiguration in the RDMA/RoCE and ExpressRoute path, and by ensuring SMB client offloads and large MTU are correctly configured, the duplicate ACK storms and connection resets during large SMB transfers should be resolved.
References:
- Slow SMB files transfer speed
- SMB features in Windows and Windows Server
- Reduced networking performance after you enable SMB Encryption or SMB Signing in Windows Server 2016 and Windows Server 2019
- Troubleshoot SMB guidance
- Troubleshoot Azure Files performance issues
- Storage Migration Service frequently asked questions (FAQ)