 |
The Question is:
I have 40 Alpha500 workstations which boot off and live as total satellites off
an ES40 in the computer room (i.e. no disks of value locally). After a
network failure many of the systems were rebooting, but even though they got
the MOP download okay and
MSCP saw the disks, they stopped because a satellite, not the main node,
started handling the Cluster Join request and locked up all the nodes. Even
after I killed that one and everything booted okay, I notice the state
transition approvals move to the m
ost recently booted system instead of completely being controlled by the
server. How can it make the server handle all activity so that no workstation
is of ANY consequence or value in any activity needed for booting or joining
the cluster? [Note: Softw
are support was stumped and said it just was a shared activity... I hope there
actually is something to control this.] VOTES on all workstations is set to
zero as is LOCKDIRWT; VAXCLUSTER=2. Thanx.
The Answer is :
The connection manager transition coordinator is a lightweight task,
and one that may not fall on the particular host you might expect or
might believe you want, the coordinator selection process is not
documented and not particularly controllable. (The details and the
selection algorithm can and have changed, as well.)
The coordinator simply sequences all cluster member nodes through the
internal transition as required, and this coordination activity only
occurs when a transition is underway.
Additionally, the coordinator is not a dedicated node; the task can
potentially move. If the node fails, the task will (obviously) move.
|