By James Cook
•
July 24, 2024
Azure Stack HCI 23H2 – Notes from the battlefield Azure Stack HCI is Microsoft’s blueprint for a hyperconverged infrastructure, running on-premises, using validated hardware and a modified version of the Windows Server Core operating system (You also get a cool Azure Logo when it boots). The obvious goal from MS with Stack, is to bridge the gap for the many use cases where there is still an on-prem requirement, whilst making use of investment in skills and foundations in Azure. The previous version of Stack HCI (22H2) was very much an on-prem deployment, lots of PowerShell, with the option to then onboard the cluster into Azure Arc for some hybrid management and monitoring capabilities. Introducing 23H2 23H2 is the latest generally available version of Azure Stack, which was released in February 2024. This latest version introduces significant changes, moving to a cloud based deployment method from the Azure portal, cloud based updates, monitoring and Arc management of workloads on the cluster from the outset (You can deploy Azure VM’s and other workloads from the Azure portal). Azure Stack HCI is a great option for hybrid environments and where there is an architectural reason to run a workload on-prem. Microsoft have also opened stack up to run other workloads which were traditionally restricted to Azure Public only, such as Azure Virtual Desktop and Azure SQL Managed Instances. My view, for anyone facing the prospect of a VMWare of Citrix renewal, Stack HCI has become hugely relevant and a potential game changer. Techy Nodes With 23H2 introducing so many changes only a months ago at the time of writing, and with the potential for so many variables in the on-prem environment, I’ll buy a coffee for anyone who gets a smooth, hardware based deployment on the first run! Below are some notes which might help others out there, these are from our experience of deploying a Two Node, switchless stack HCI 23H2 cluster for a customer on Dell AX-750 hardware. We pencilled in 5 days, I don’t want to admit how long it actually took! I have stopped short of sharing all steps taken or our complete build notes, there are many blogs out there which misled us during troubleshooting, as the official Microsoft advice, guidance and documentation is changing so rapidly, I don’t want to add to that. Prep To steal a phrase from an MS engineer, “deploying 23H2 is like painting your house, 90% of the work is in the prep”. The documentation could be clearer in my opinion, take the time to understand your deployment scenario (there are many, switchless, switched, number of nodes etc), take the time to read and re-read the re-req’s for your scenario. Do not rush ahead to hitting go on the deployment hoping errors will steer you nicely towards and pre-req’s you’ve missed. They are more likely to lead you down week long rabbit holes. Prerequisites to deploy Azure Stack HCI, version 23H2 - Azure Stack HCI | Microsoft Learn Active Directory Prep One of the first prep steps involves running a PowerShell command to create the AD OU and deployment account for the cluster. This command creates the OU and then quickly blocks inheritance on the new OU. Tip – We’ve seen this cause an issue if the script is run in a different AD site than the PDC, due to replication. Run the script on a terminal in the same site as the PDC if possible. If it fails due to not being able to block inheritance, go make a brew, and run it again, no need to delete the objects the first attempt created. Host Prep Stack HCI runs on server core, no GUI, minimal options for clicking to find things! Tip - Microsoft love to set a US keyboard so this is the first thing we change in our scenario (being based in the UK) Set-WinDefaultInputMethodOverride -InputTip "0809:00000809" Set-ItemProperty 'HKCU:\Keyboard Layout\Preload' -Name 1 -Value 00000809 Host Networking Read and re-read the documentation on this one. Network considerations for cloud deployment for Azure Stack HCI, version 23H2 - Azure Stack HCI | Microsoft Learn Tip – In our scenario, the customer environment used a specific VLAN for management which caused issues and the deployment to get stuck at “Configuring host networking”. If you are defining a management VLAN, you should set this on your management nics prior to deployment using the set-netadapter command Its also advisable to rename your NICS to a friendly name as these NIC names get passed through to the Azure portal for selection during deployment. Rename-netadapter -name “original nic name” -newname “MANAGEMENT01” Tip – Ensure the right features such as RDMA are enabled for your network adapters in the host BIOS Tip – Ensure your NICS are using the latest supported vendor drivers (not Microsoft drivers!) Tip – Do not set a VLAN iD on your storage nics, the deployment handles this for you ARC Registration Register your Azure Stack HCI servers with Azure Arc and assign permissions for deployment - Azure Stack HCI | Microsoft Learn The next step is to onboard your hosts to Azure ARC. This allows them to be selected to be part of a Stack HCI portal deployment. Tip – Azure Stack HCI is only available in a limited amount of regions, this is purely for management purposes and not being in your local region is not a massive issue. In our scenario, our landing zone config for the customer restricted deployments to UK regions, so we onboarded the nodes to UK South, this made them unavailable for selection in a Stack HCI cluster. At the time of deployment Stack HCI was only available in West Europe and the nodes have to be onboarded to Arc in the same region. Tear it out, start again! Onboard your Stack nodes to a supported region for Azure Stack HCI! Tip - Following onboarding to Arc – don’t rush to kick off the deployment right away. Several extensions need to be installed and healthy to continue, even after the onboarding PowerShell on each node has finished, the extensions take a while to drop in. Example below (This may change, refer to the MS documentation!) Portal / ARM Deployment Once you have prepped, checked, prepped some more and onboarded to Arc you are ready to kick off a deployment via the Portal or an Arm template: Deploy an Azure Stack HCI system using the Azure portal - Azure Stack HCI | Microsoft Learn If all the prep and pre-req’s have been done, all that should be required now is to be careful when entering the required environment information and monitor the deployment. Tip – The deployment has a “Validation” stage which deploys a number of prep objects before the actual deployment. One of these is an enterprise app registration. When we did our first deployment in March / April 2024 – There was no mention of permissions required for this in the documentation If the Azure user initiating the deployment is not a GA, assign them the Application Administrator role to avoid errors with the validation stage and enterprise app registration. Closing Notes At this point you are in the hands of the deployment. Quoting another MS engineer “It either goes smoothly or is a nightmare, there’s rarely an in-between!) There are plenty more things that could have been mentioned in this blog, I did want to keep it short and knew that wouldn’t happen. One of the core issues which caused this first 23H2 deployment to be particularly painful was down to the customers Active Directory environment. Instability with AD and DNS will without a doubt, trip up a Stack deployment. Whilst some of the above may come across negative, it’s not meant to be. As with any tech, especially a new released process, there are teething problems and I’m sure it will get smoother over time. I do feel for the MS support teams looking after Stack HCI, in a greenfield environment I imagine it deploys fine every time. There are so many issues outside of the deployment itself which can cause it to go run, it must be a nightmare to support and try to avoid “Scope creep”. The benefits when it’s all done and you’re sat back with a cup of tea? A truly hybrid infrastructure which allows you to manage your on-prem and Azure workloads with the same tooling (including infrastructure as code), deploy cloud first workloads on-prem and monitor / manage your environment using tools such as Azure Monitor, Policy, Update Centre and Defender for Cloud. Now that is quite cool!