How to troubleshoot 802.1X client connectivity issues?

Summary

Troubleshooting an enterprise WLAN that uses port-based network security 802.1x can be challenging. Troubleshooting 802.1X mainly involves understanding the flow of authentication and identifying where it’s breaking. This guide will help you troubleshoot common issues with 802.1X authentication on various Ruckus solutions hosting 802.1x wireless networks.

Question

How to troubleshoot 802.1X client connectivity issues?

Customer Environment

Virtual SmartZone (vSZ). SmartZone-144 (SZ-144). SmartZone-100 (SZ-100). SmartZone-300 (SZ-300). ZoneDirector-1200 (ZD-1200). Ruckus One (R1). Ruckus Zone Director Ruckus Unleashed

Symptoms

  • Newly configured 802.1X/ Radius WLAN does not work.
    1. Clients unable to connect to newly configured 802.1X WLAN
    2. AAA test does not work
    3. Clients connect but do not have internet access after they pass AAA authentication
  • 802.1x WLAN that was working suddenly stopped working
    1. With changes made
    2. Without any significant changes made
  • 802.1x WLAN performance issues
    1. Certain types of clients can connect, and certain ones do not
    2. Clients drop connections randomly
    3. Throughput issues on 802.1x/Radius WLAN
    4. Clients show insecure or warning when connecting to a 802.1x WLAN

Troubleshooting Steps

This article starts with the introduction and explanation of the protocol and the various network nodes that participate in 802.1x.
It also explains how the feature works on different Ruckus Wireless solutions and how we can troubleshoot common issues. 
This includes Ruckus Smart Zone, Ruckus One, Ruckus Unleashed / Zone Director. 

Introduction to 802.1x and sample network topology and endpoints

Before we start troubleshooting, we need to identify the nodes / network devices involved. 
The following is a generalized network diagram for a typical 802.1x WLAN implementation. 
The three main network components of a 802.1X wlan are: 
• Supplicant (End user device)
• Authenticator (Mostly the Wireless Access Points/Switches or Wireless Controller)
• Authentication Server (Ex: Windows NPS, Cloudpath, FreeRadius, and other radius servers)
User-added image

The wireless client is called the Supplicant.

  • The UE uses EAP protocol to communicate with the AP over wireless 
  • This involves a series of steps were both the client and the server validates each other's identity.  
  • The different flavors of EAP are.
    • EAP – PEAP / EAP-MSCHAP/ EAP-MSCHAPv2 
    • EAP – TLS 
    • EAP - TTLS
    • EAP – SIM
    • EAP- AKA 
  • The client and the server determine the type/flavor of EAP, the Ruckus Controller/AP acts as the mediator in most scenarios. 

The Access point or the wireless controller (based on the deployment type) is the Authenticator. 

  • In RUCKUS Zone Director Architecture the Zone Director is the Authenticator 
  • In RUCKUS Smart Zone Architecture, the Smart Zone controller can be the authenticator in Radius Mode Proxy. When Radius Mode is set to non-proxy the Access Point is the authenticator 
  • RUCKUS Cloud follows the same rules as RUCKUS Smartzone
  • RUCKUS Unleashed; master access point is the authenticator. 
  • The Authenticator talks EAP with the UE and talks Radius Protocol with the Authentication Server 
  • The authenticator terminates EAP and translates it to Radius 
  • An authenticator is also called a radius client 

The 802.1x server (Network Policy Server / Network Policy Manager)

  • RUCKUS CloudPath can act as an Authenticator Server and can integrate with all RUCKUS wireless and wired product line 
  • OpenRadius based Radius Servers 
  • Microsoft Network Policy Server (NPS) + AD
  • FortiNAC, CISCO ISE and Aruba ClearPass are examples of Network Policy Manager’s
  • The authentication server communicates with the AP/Controller using the radius protocol 

 

The Radius Server 

Troubleshooting a 802.1x WLAN starts at the radius server. 
  • Note down the MAC and IP address of the following device. 
    • The mac addresses wireless UE/Client attempting the authentication. 
    • Rough timestamp of the failures
    • Replicate the failure Multiple times if this is a test device. 
    • Mac and IP address of the authenticator 
    • If it’s the AP that terminates EAP note down the AP’s MAC and IP
    • If it’s the controller note down the controllers MAC and IP address 
  • Each Authentication server has logging mechanisms to debug a failure scenario, for example if it's a windows NPS server – look at the event viewer for the client MAC in question (calling station ID)
  • References for commonly used servers can is attached at the references section
  • If not, failure logs are available, it is either because the Radius Client Secret which is added on both sides is mismatched or the Radius Client (the Authenticator) is not able to communicate on the radius UDP port to the Radius server. 
  • Perform a Wireshark / TCP dump on the Radius Server using the AP’s or Controller IP address as a filter.
  • If an incoming radius (default UDP 1812) packet it seen this confirms that the communication for Radius 
  • If no packet is seen please investigate the Radius Configuration on the Ruckus Controller / AP 
  • Validate UDP network connectivity between the Authenticator and the Authentication Server 
  • If there are no Radius Requests received at the Radius Server or if the server is sending a Radius accept messages are sent to the Ruckus Authenticator still clients are unable to connect, then proceed to troubleshoot the Authenticator (The Ruckus Wireless Solution)
Reference: RADIUS Server Concepts

Troubleshooting tips and scenarios for various Ruckus Wireless Solutions: 

Ruckus Smart Zone 

  • Ruckus Smart Zone controller can work in two modes when it comes to Radius Authentication 
 User-added image
Figure 2: Radius Authentication Service Modes
  • Proxy
    • The Ruckus Smart Zone controller runs a service called the RadiusProxy which is based on OpenRadius 
    • All radius requests are sent from the controller’s management IP address 
    • Access points encapsulate authentication requests inside the AP – Controller SSH tunnel.
  • Non-Proxy 
    • The access points terminate EAP and sent Radius requests to the Authentication server 

Reference for 802.1X configuration of Smart Zone: SZ 6.1.2- Wlan Management Guide

  • Based on the mode selected you can debug the problem on the respective node
  • RadiusProxy Logs if Proxy and AP Support Logs if Non-Proxy 
  • If the mode is Radius Proxy, you can enable debugging on the module in question via the Application Logs (Monitor > Troubleshooting and Diagnostics > Application Logs)
  • Click on Warning which should show a drop down option, change this to Debug
  • Replicate the failure using the wireless client a couple of times and then download the “radiusd.log” file by clicking on the number right next to the Debug Status
  • This file is compressed and can be opened using software similar to 7Zip, once extracted the file can then be opened using a text editor like Notepad++ 
 User-added image
User-added image
Figure3 : Setting the RadiusProxy module to debug and downloading the radiusd.log 

Reference for downloading application logs: Working with Application Logs
  • Search the file using the clients MAC address or the AP’s MAC address or the server IP Address and validate if the Radius Requests are being sent out from the controller
  • A sample for a successful authentication will look like the following: 
Received Access-Request length 358
[Tue Jul 09 2024 
14:23:13:394][CP][RADIUS][DBG][TID=1645209344][src/main/process.c:698]
    Radius Packet Header:
    Code: 1
    Id: 44
    Length: 358
    Authenticator: 0x91dec57d946c95bc7035d3dcb6ad5c2c
    User-Name = "[email protected]"
    NAS-IP-Address = 192.168.29.229
    NAS-Identifier = "2C-AB-46-A3-46-A2"
    Called-Station-Id = "2C-AB-46-A3-46-A2:Secure"
    NAS-Port-Type = Wireless-802.11
    Service-Type = Framed-User
    NAS-Port = 1
    Calling-Station-Id = "42-6C-F7-F8-30-9F"
    Connect-Info = "CONNECT 802.11"
    Acct-Session-Id = "668D47CF-2346A000"
    Acct-Multi-Session-Id = "0711D9805AF08709"
    WLAN-Pairwise-Cipher = 1027076
    WLAN-Group-Cipher = 1027076
    WLAN-AKM-Suite = 1027073
    Ruckus-SSID = "Secure"
    Ruckus-BSSID = 0x2cab46a346a2
    Ruckus-Wlan-Id = 688
    Ruckus-Sta-Vlan-Id = 1
    Ruckus-SCG-CBlade-IP = 134.242.136.121
    Ruckus-Domain-Name = "Cecil"
    Ruckus-Zone-Name = "home"
    Ruckus-Wlan-Name = "Secure"
    EAP-Message = 0x0254001c01746573744062796f642e636c6f7564706174682e6e6574
    Chargeable-User-Identity = 0x00

[Tue Jul 09 2024 14:23:40:245][CP][RADIUS][DBG][TID=1300178688][src/main/process.c:612]
Received Access-Accept Id 42 from 72.18.151.76:12474 to 10.9.182.15:47178 length 208

[Tue Jul 09 2024 14:23:40:246][CP][RADIUS][DBG][FID=1,ueMac=42:6C:F7:F8:30:9F,TID=1300178688][wsg_rad_utils.c:1282]
User-Name ([email protected]) received in access-accept, overide in all subscriber context
  • The above strings are parameters that can be checked on a packet capture taken on the server or the server logs. 
  • Here are some sample error snippets for failed radius request for a bad username / password / auth type will look like the following: 
[Tue Jul 09 2024 14:41:14:763][CP][RADIUS][DBG][TID=1065346816][src/modules/rlm_realm/rlm_realm.c:179]
No '@' in User-Name = "test", looking up realm NULL

[Tue Jul 09 2024 14:41:22:224][CP][RADIUS][DBG][TID=1291785984][src/main/process.c:612]
Received Access-Reject Id 138 from 72.18.151.76:12474 to 10.9.182.15:47178 length 48

[Tue Jul 09 2024 14:41:22:224][CP][RADIUS][ERR][FID=1,ueMac=98:B3:79:3A:6F:0E,TID=1291785984][wsg_rad.c:1968]
Recvd Access-Reject from AAA Name:[training.cloudpath.net_auth] for UE MAC:[98-B3-79-3A-6F-0E]

[Tue Jul 09 2024 14:41:22:224][CP][RADIUS][DBG][TID=1291785984][src/main/process.c:739]

        SRC-IP: 72.18.151.76
        SRC-PORT: 12474
        DST-IP: 10.9.182.15
        DST-PORT: 47178
        Radius Packet Header:
        Code: 3
        Id: 138
        Length: 48
        Authenticator: 0x2d070fa04d47bc146ddadbbb15385d16
        EAP-Message = 0x04080004
        Message-Authenticator = 0xf931944e0018e455905be128a2df9671
        Proxy-State = 0x3831
  • The above logs are also present on the Snapshot logs bundle, which can be shared to Ruckus Support for further insight 

The Ruckus SmartZone controller includes an integrated visual troubleshooting tool that can graphically display the aforementioned logs.

  • This can be enabled and viewed by navigating to Monitor > Troubleshooting And Diagnostics > Troubleshooting > Choose “Client Connection” as the type and feed in the test client MAC address
User-added image
Figure 4: Troubleshooting Tool Inbuilt into the SZ 
 
User-added image
 
Figure 5:  A working client screenshot from the troubleshooting tool
 User-added image
Figure 6: a radius reject / non-working screenshot from the troubleshooting tool 

Reference for using the troubleshooting tool for client connections: Troubleshooting Client Connections

Troubleshooting tips:

  • Most operating systems in the market may randomize its MAC addresses for privacy reasons – always disable this as a troubleshooting step to make both the process easier and logging productive. 
    • Apple devices call this feature Private MAC Addressing 
    • Android devices call this MAC Randomization 
    • Windows devices call this feature MAC Randomization   
  • Try to choose a smaller AP list in the Select AP’s tab to improve the responsiveness of the tool.
  • Disable LTE/4G/5G / Cellular Connections / Mobile Data on devices when troubleshooting to prevent the device from using an alternate uplink to the internet, this can greatly fasten the time taken to connect to the WLAN

Ruckus Analytics/ Ruckus AI

Ruckus Analytics is yet another tool that can provide insights/ stats on Radius client performance

  • Using either the client MAC address or the access point MAC address we can trace events and find out where the radius work flow breaks
    • https://ruckus.cloud/ai > Clients > Locate the client using its MAC address 
  • In the below example, the client was attempting to authenticate against a 802.1x WLAN and was using the wrong credentials 
    • Similar outputs are also seen when clients use wrong auth types
 User-added image
Figure 7: Ruckus AI example for a Client using bad credentials to authenticate against a 802.1x WLAN
  • Ruckus Smart Zone AP’s with non-proxy mode authentication 
  • The AP Support log collected immediately after an authentication failure can have information about the radius transaction. 
  • Sample logs for a working client: 
Jul 17 14:50:53 Cecil-LTE_AP daemon.info hostapd: wlan34: STA 44:03:2c:bd:33:32 IEEE 802.11: wlan34: IEEE 802.11: Using EAPOL case, username = [email protected], original = [email protected]
Jul 17 14:50:53 Cecil-LTE_AP daemon.info hostapd: @@206,clientAuthorization,"apMac"="2c:ab:46:23:46:a0","clientMac"="44:03:2c:bd:33:32","ssid"="Secure","bssid"="2c:ab:46:a3:46:a2","userId"="","wlanId"="688","iface"="wlan34","tenantUUID"="839f87c6-d116-497e-afce-aa8157abd30c","apName"="Cecil-LTE_AP","apGps"="8.88757,76.60933","userName"="[email protected]","vlanId"="1","radio"="a/n/ac","encryption"="WPA2-AES","band"="5g"
Jul 17 14:50:53 Cecil-LTE_AP daemon.info hostapd: wlan34: STA 44:03:2c:bd:33:32 IEEE 802.1X: wlan34: IEEE 802.1X: authenticated - EAP type: 13 (unknown)
  • The smart zone troubleshooting tool can also provide helpful information as explained above for the smart zone example 
  • The same applies for RuckusAI 

Ruckus Cloud / RuckusOne

  • Just like Ruckus Smart Zone, RuckusOne also supports both proxy and non proxy modes of authentication for 802.1X 
  • The modes are selected on the AAA Settings of the WLAN as shown in the below example: Proxy Service
User-added image
 
Figure 8: 802.1X Proxy/Non Proxy configuration setting 
  • When using proxy mode the radius requests are generated from device.ruckus.cloud / the cloud controllers public interface 
    •  It is expected that the radius server used must also be routable on the internet / have a public IP with port selected being NAT’d 
  • Non-Proxy mode is used when the radius server is only reachable via the LAN, the radius requests are generated from the IP address of the access points 
  • Troubleshooting is done mainly using the AP support logs and Ruckus Analytics built into RuckusOne 
  • Radius failures are auto detected and highlighted to the admin by RuckusOne 
  • https:ruckus.cloud/ > AI Assurance > Incidents 
  • RuckusAI built into RuckusOne can assist with most known scenarios involving radius 
  • To invoke troubleshooting features, head over to Clients > Client List (X) – Use the search box labelled “Search for connected and historical clients”
User-added image
Figure: 9 Ruckus Analytics client debugging using client MAC address 
  • If a radius reject message is seen please start troubleshooting the radius server 
User-added image
 
Figure 10: Locating the client under historical clients to navigate over to troubleshooting page to review connection issues 
User-added image
User-added image
 
Figure11: Troubleshooting TAB which can be used to review the client interaction on the Radius WLAN 
  • The scope for live debugging directly on the on the access point / cloud controller is limited for a RuckusOne admin 
  • Access to AP ssh credentials is limited to the support team 
  • Access to the radius module is also limited to the support team
  • The admin can still review the AP support log files from the AP 
  • Please expect a worst case of 30 sec - 60 second delay in client data showing up on Ruckus AI 

Ruckus Unleashed/Zone director:

  • Ruckus Unleashed is very similar to the working of the Zone Director and hence this section will be applicable to both 
  • The Ruckus unleashed Master AP (or the zone director) will interact with the Radius server 
  • The configuration guide can be located here: 802.1 Unleashed Configuration
  • Ruckus unleashed has an inbuilt debug tool similar to the Smart Zone and Ruckus One
    • This can be invoked here : Admin & Services > Administration > Diagnostics > Client Troubleshooting, and locate the Client Connection Logs section.
    • This will give us a rough idea on the reason for the dissconnect
    • The logs can be exported and shared to Ruckus TAC or imported later for review 
User-added image
  • Common issues: 
    • Radius Rejects
      • Check the radius server logs for the client transaction 
      • Check client credentials / certificates / EAP type 
    • No response or Radius time out
      • Validate UDP connectivity for the Radius Port in question between the master AP and the Radius server 
        • Check the UDP port configured on the Radius Server 
      • Validate Radius Shared Secret and reconfigure if necessary 
      • Take packet captures on the AP uplink or on the radius server side 
        • Look for packets sourced from the Unleashed Master AP
Reference for the troubleshooting tool: Using the unleashed troubleshooting tool

Resolution

Common issues and resolution for those scenarios

  • Newly configured 802.1X/ Radius WLAN does not work.
    • Clients unable to connect to newly configured 802.1X WLAN
      • Make sure that the radius server has a valid CA signed certificate from a well known CA like Godaddy/Commodo/Verisigin etc 
      • Make sure the radius server configuration for the EAP method / flavor of 802.1x matches that configured / selected on the client 
      • Newer Android / IOS clients have privacy features they sends a bogus / anonymous user name please check with the device manufacturer on instructions to disable it 
  • AAA test does not work
    • The AAA inbuilt AAA test features is based on OpenRadius PAP / CHAP
    • Most modern day server have PAP/CHAP disabled  
  • Clients connect but do not have internet access after they pass AAA authentication 
    • Check client VLAN assigned post authentication / pre authentication. 
    • Check client IP options like DNS / Subnet  
  • 802.1x WLAN that was working suddenly stopped working 
    • With changes made 
    • Try to revert changes made and test if system is working 
  • Without any significant changes made 
    • Check status of AAA radius server certificate, it could have expired
    • Check if connectivity between the nodes that participates in the authentication workflow have broken 
    • Check outside of both Ruckus and Radius server if there were any changes 
  • 802.1x WLAN performance issues 
    • Certain types of clients can connect, and certain ones do not
      • Each device vendor has its quirks and features – make sure all advanced features are disabled  
    • Clients drop connections randomly
      • Troubleshoot RF issues 
    • Throughput issues on 802.1x/Radius WLAN
      • Validate client loads on each AP 
      • Validate if a AAA policy is pushed via the Radius server to limit bandwidth
      • Validate and troubleshoot RF 
  • Clients show insecure or warning when connecting to a 802.1x WLAN
    • Check AAA server certificate
    • Make sure its signed by a known CA 

In each of the above scenarios to narrow down where the problem is, we need to capture data on each node to verify where the problem is. 
Logs and data to collect when opening a support case for a 802.1X failure issue 
  • Please collect the AP support logs shortly after the issue replicates while opening a TAC case
  • Please also collect the Client MAC address with MAC randomization disabled and a rough timestamp of when the issue occurred 
  • Smart Zone Snapshot logs with the radius service in debug mode 
  • Enable Support access to RuckusOne / RuckusAI via the Administration tab 

Article Number:
000014389

Updated:
October 09, 2024 01:25 PM (2 months ago)

Tags:
Performance, Configuration, Troubleshooting, ZoneDirector, SmartCell Gateway, SmartCell AP, Cloud Services

Votes:
95

This article is:
helpful
not helpful

Working...Please wait

This is here to prevent you from accidentally submitting twice.

The page will automatically refresh.

Alert!!

Close