This is truly the article I would have LOVED to have found when we first got the DHCP settings in for our Lync Phone Edition devices and other Lync phones, and were going crazy trying to figure out why the LPE devices were fine right after being tethered to a PC, but were not if someone logged in and out of them while disconnected and after rebooting. And that I sort of promised to write when I was raving about a certain switch.
The symptoms: Test-CsPhoneBootstrap works flawlessly. Other Lync phones can log on with extension and PIN. Your Lync Phone Edition device (in our case, the Polycom CX3000) will cheerfully log on with the extension and PIN if you’ve logged it in tethered to a PC via the PC’s Lync client first, but gives you “An account matching this phone number cannot be found. Please contact your support team” after a very quick flash of another error, “Account used is not authorized, please contact your support team” for the very same extension and PIN if you’ve logged out of the device and powered it down. I did what another admin did, taking a video on my phone, then replaying it really slowly – the time from entering the PIN to getting the final failure message was less than 4 seconds, and that was necessary to see the first failure message that briefly flashed on the screen.
Some WireSharking showed that when it was connected to the PC, or had just been, it was getting two certificates during the handshake before certificate provisioning: the server certificate for the Lync pool and the intermediate CA that issued that certificate, but was only getting the server certificate when it was disconnected, logged out and restarted, with the failure message on the device’s screen seconds after. (Hat tip and a virtual case of tasty Bavarian beer to Drago Totev of Unify Square for pointing this out!)
Apparently, LPE is not quite clever enough to find the intermediate CA certificate itself, though it gets the root CA certificate from Active Directory. By contrast, the AudioCodes and snom phones we were testing at the same time do fine once they get DHCP Option 43 and 120 when they expect to: during the “DHCP Inform” round, after they’ve gotten their IP addresses. Either they were a bit more resourseful than the LPE devices, or are a bit more (too?) trusting… will let you know if I ever find out.
The LPE devices got server+intermediate cert during the handshake when we changed the pool’s DNS entry to point directly at the IP of one of the Front End servers instead of the IP on the Cisco ACE HLB (thanks again for that suggestion, Drago), and LPE phones were happily logging in with extension and PIN even after a hard factory reset (4 and 6 held down while rebooting – this WILL scrub your latest firmware update!). Once we set the DNS entry for our Lync pool back to the IP address on the HLB, we got same login failures as before. Note: just changing the DHCP template option 43 to use one of the Lync Front End server names instead of the pool name will NOT work – the certificate provisioning service uses the pool name (I tried that.)
So now it was clear: somehow, we had to get the Cisco HLB to give out the intermediate certificate along with the server certificate when making the handshake with the LPE devices. I tried exporting the whole chain as P7B (no private key) and PFX (with private key), then using OpenSSL to make the PEM files the Cisco HLB knew how to consume. The Cisco HLB recognized the certificates, but was still only giving out the server certificate. For reasons that are obvious, Cisco is not falling all over themselves to tell you how to make Lync work with one of their older HLBs ;)
The answer: Certificate Chain Group associated with the interface instead of just the server certificate.
Since I’m not going to pretend that I know anything about Cisco stuff, here’s a link to the manual, and you can get your network guy (or gal) to figure out what it means for you and your certs: http://www.cisco.com/c/en/us/td/docs/interfaces_modules/services_modules/ace/vA5_1_0/command/reference/ACE_cr/chaingrp.html
With that now in place, WireShark showed the the LPE devices are receiving two copies of the server certificate, along with one each of the intermediate CA cert and the root CA cert. I have a feeling that the Cisco ACE is passing along both the chain group and the server certificate, and that we probably could have gotten away with just having the server cert and the intermediate CA cert in the chain group, but hey, it works, and I don’t really feel like hassling my network guy any more about this!
Even though we have DNS load balancing in place for SIP traffic, the configuration of the HLB is still critical for phone authentication, because the certificate provisioning service is a web service. If we did not have the intermediate CA (totally NOT recommended – your root CA should be turned off, disconnected, locked up somewhere and only taken out to make new intermediate CAs that actually do your issuing), we most likely would not have had this issue.
If you are having issues like this and are using a Cisco ACE for load-balancing, first, check your configuration against the details in this post by Andrew Travis. Then, see about a certificate chain group instead of just the server certificate.
Dummies Clever Lync Admins some other time when I’m in a screenshots mood.