SmartZone node not able to join the Cluster

Summary

Common causes of node join failures are Misconfiguration of Network Settings, Firmware Version Mismatch, Form-Factor Compatibility, Interface Configuration Matching, IP Version Compatibility, VM Resource Level Validation etc.

Question

Why is new SmartZone node not able to join the Cluster?

Customer Environment

Any SmartZone (Physical or Virtual)

Root Cause

Common root causes of node join failures are Misconfiguration of Network Settings, Firmware Version Mismatch, Form-Factor Compatibility, Interface Configuration Matching, IP Version Compatibility, VM Resource Level Validation etc.

Symptoms

SmartZone node not able to join Cluster

Troubleshooting Steps


Cluster join flow

Here are the troubleshooting steps if a SmartZone node fails to join a Cluster -
  1. Verify there is reachability between the Leader node and the new node that is trying to join the Cluster. You can do the ping test from CLI of the nodes.
  2. Verify the node is running the same firmware version as the Cluster nodes.
  3. Ensure that the new node is of the same form-factor as the Cluster nodes i.e. a hardware SmartZone such as SZ100/SZ144 can form cluster with only hardware SmartZone, similarly Virtual SmartZone can only form Cluster with other Virtual SmartZone.
  4. In case of Virtual SmartZone, ensure that the connected node and new node are following the same version i.e. High-Scale or Essentials version.
  5. Verify that the interface configuration of new node matches with the Cluster nodes. For example, for Virtual SmartZone High Scale (vSZ-H), it should match the single interface or three interface mode corresponding to the connected nodes. Similarly, in case of Hardware SZ, it should match single port group vs two port group configuration. In case of mismatch, you would encounter the following error “Error on checking cluster port group setting. Please make sure the port group setting are compatible” on the GUI. You can use the command "show interface" on SmartZone CLI to check this.
  6. The IP version (IPv4, IPv6, Dual) should also match between the nodes.
  7. Validate the VM resource level matches between the nodes. If there is mismatch in the VM resource level, you might get an error “Error on checking node resource plan. The cluster requires all resource plan to be the same”. Please refer to this article on how to resolve this error - https://community.ruckuswireless.com/t5/RUCKUS-Self-Help/Error-on-checking-node-resource-plan-The-cluster-requires-all/td-p/80262
  8. Ensure the latency between the SZ nodes is not too high else it will cause issues in joining the cluster. You can find the latency requirement in the Release notes for a particular firmware under the section “Cluster Network Requirements”.
  9. Make sure all the current members of the Cluster are in connected status and their services are online. You can verify this by logging into CLI of the cluster nodes and running the following command – 
Show service
Show cluster-state

“show service” should show all services in online status
“show cluster-state” should show none of the following keywords in “system-state” or “cluster-state” – Out of Service, maintenance, crash, suspend, NetworkPartitionSuspected
If you encounter the error “Error on first time initialization Process”, please once try a factory-reset on the new node (make sure you factory-reset the new node only, not the active member of the cluster) and then try to join again.

 

 

Resolution


If all the above troubleshooting is tried, please collect the following logs, and open a support case with RUCKUS –
  1. From the current leader node, change the logging level of “Web” and “Configurer” in debug mode and then try to join the node again and then collect the Snapshot log. Once the logs are downloaded, please change the logging level back “Warning” state. Refer to the following article on how to download Snapshot logs - https://docs.commscope.com/bundle/sz-610-adminguide-sz300vsz/page/GUID-41CD8D53-221C-4215-9519-75A0AD8FFE86.html. Note – Make sure the logging level is changed back to “Warning”, else it will overwhelm the resources.
  2. Collect “show service” and “show cluster-state” output from all the connected nodes.

Article Number:
000014441

Updated:
October 09, 2024 01:20 PM (2 months ago)

Tags:
Configuration, Installation, Troubleshooting, SmartCell Gateway, Ruckus Support Services

Votes:
1

This article is:
helpful
not helpful

Working...Please wait

This is here to prevent you from accidentally submitting twice.

The page will automatically refresh.

Alert!!

Close