 |
The Question is:
The system is running on cluster with 2 AS4100 and 1 AS800 as quorom. We
experience a system reboot on all the servers at the same time (not exactly).
From the analysis, it showed network failure.
Is it the right scenario that all servers will reboot when they experience
network failure? Is there any setting that permits cluster members to be
off-line with each other for certain time before it require a reboot? Is there
any other better setup that
can be implemented to ensure at least one member to remain alive in any network
failure.
Thanks and Regards,
Zul
The Answer is :
The OpenVMS Wizard will assume the configuration involves two
AlphaServer 4100 series systems and an AlphaServer 800 series
system, but there is no cluster topology provided.
You will need a redundant communications path if you want to
survive a network communcations failure, obviously.
In general, a cluster with only network communications links does
not need and usually should not have a quorum disk (or a so-called
quorum host). This implies that VOTES and/or EXPECTED_VOTES are set
incorrectly (please see the OpenVMS FAQ), or that there is a storage
path or a secondary communications path that has not been mentioned.
(So-called creative settings of VOTES and EXPECTED_VOTES can and do
lead to corruptions and crashes, usually during cluster formation or
cluster transition.)
With a proper cluster configuration and assuming a failed network
communications path, exactly one host should survive -- this is the
host with the vote. (With only network paths and no shared storage,
there should be only one voting node.) With redundant communications
in a cluster configuration, multiple hosts can and should survive.
Your OpenVMS version is in need of an upgrade to a more recent and
preferably to a more recent and supported release.
 |
|
|
 |
|