Core group is a physical grouping of JVMs in a cell that are candidates to host singleton services. It can contain stand-alone servers, cluster members, Node Agents or the Deployment Manager, each of these run in a JVM.
Each JVM process can only be a member of one core group. Naturally, cluster members must belong to the same core group. At runtime, the core group and policy configurations are matched together to form high availability groups.
A set of JVMs can work together as a group to host a highly available service. All JVMs with the potential to host the service join the group when they start. If the singleton is scoped to a WebSphere cluster, such as a Transaction Manager or a messaging engine, then all members of the cluster are part of such a group of JVMs that can host the service. WLM information for clusters is available to all members of the core group in a peer-to-peer fashion.
A core group cannot extend beyond a cell, or overlap with other core groups.
The most important thing is that all JVMs in a core group must be able to open a connection and send heartbeat messages to each other.
Core groups in the same cell or from different cells can be federated to share WLM information using the core group bridge service. In a large-scale implementation with clusters spanning multiple geographies, you can use a variety of transport types for different core groups and link them together with the core group bridge service to form flexible topologies.
After the membership of the core group stabilizes at runtime, certain members are elected to act as coordinators for the core group. A core group coordinator is responsible for managing the high availability groups within a core group. The following aspects are managed by the core group coordinator:
Maintaining all group information including the group name, group members and the policy of the group.
Keeping track of the states of group members as they start, stop, or fail and communicating that to every member.
Assigning singleton services to group members and handling failover of services based on core group policies.
By default, the HAManager elects the lexically lowest named group member to be the coordinator. Since being a coordinator takes up additional resources in the JVM, you may wish to override the default election mechanism by providing your own list of preferred coordinator servers in the WebSphere Administrative Console.
You can do this by selecting Servers -> Core groups -> Core group settings -> -> Preferred coordinator servers. Specifying just one server in the list does not make it a single point of failure. The HAManager simply gives the server a higher priority over others in the core group instead of giving it an exclusive right to be a coordinator. Consider the following when deciding which JVMs should become preferred coordinators:
Example 9-1 Message for a JVM becoming an active coordinator
[10/27/04 17:24:46:234 EDT] 00000013 CoordinatorIm I HMGR0206I: The Coordinator is an Active Coordinator for core group DefaultCoreGroup.
Example 9-2 Message for a JVM joining a view not as a coordinator
[10/27/04 18:49:03:016 EDT] 0000000a CoordinatorIm I HMGR0228I: The Coordinator is not an Active Coordinator for core group DefaultCoreGroup.
The preferred coordinator list can be dynamically changed. If there is a newly elected coordinator, a message as in Example 9-3 is written to the SystemOut.log.
Example 9-3 Message for an active coordinator retiring from the role
[10/27/04 17:28:51:766 EDT] 00000013 CoordinatorIm I HMGR0207I: The Coordinator was previously an Active Coordinator for core group DefaultCoreGroup but has lost leadership.
When a JVM process with the active coordinator is no longer active (because it is stopped or crashes), the HAManager elects the first inactive server in the preferred coordinator servers list. If there is none available, it will simply elect the lexically lowest named inactive server. If there are fewer JVMs running than the number of coordinators in the core group settings, then all running JVMs are used as coordinators.
The newly elected coordinator initiates a state rebuild, sending a message to all JVMs in the core group to report their states. This is the most CPU-intensive operation of a coordinator. Multicast is the ideal transport type for this operation. See Multicast for more information.
How many coordinators do I need?
Most medium-scale core groups only need one coordinator. The following are possible reasons for increasing the number of coordinators:
Heavy heap usage found in the verbose garbage collection log of the JVM acting as the active coordinator.
High CPU usage when a newly elected coordinator becomes active.
Both of these conditions are only a problem under the following circumstances:
There are thousands of WebSphere clusters deployed in the core group.
There are thousands of JMS destinations deployed in the core group.
A WebSphere Extended Deployment application using partitioning is having more than 5000 partitions.
For 99% of customers, it won’t be necessary to use more than a single coordinator. Normally, you’ll use a preferred server to pin the coordinator to a server that doesn’t start/stop typically but if that server fails then the coordinator will move to the lexically lowest JVM.
The underlying message transport of HAManager is a reliable publish/subscribe messaging service, known as Distribution and Consistency Services (DCS). A buffer is created to hold unprocessed incoming messages and outgoing messages that have not been acknowledged.
The default memory size is 10MB with the rationale to reduce memory footprint. As the buffer is shared with DRS for HTTP session replication and stateful session bean state replication, you may wish to increase the buffer size if your WebSphere environment has high replication demands. When an application server is running low on the transport buffer, a HMGR0503I message is displayed in the SystemOut.log of the JVM logs. This is not an error message. It is simply an informational message that indicates a congestion event has occurred in the transport buffer. The messages that are not sent during the congestion are retried later or they will be sent as a batch to make use of the transport buffer more efficiently.
The ideal setting is very dependant on load but during some internal benchmarks, we used buffer sizes of 80MB for a cluster that was processing 20,000 HTTP requests per second and each request resulted in 10k of session state to replicate. This is an extreme that very few customers would see in practice but gives an idea of the scale of things.
Example 9-4 Informational message when the transport buffer is running low
HMGR0503I: The message path of the internal data stack named is highly congested.
To change the transport buffer size, open the WebSphere Administrative Console and click Servers -> Application server -> -> Core group service (under Additional Properties). See Figure 9-3 for the configuration panel.
Distribution and Consistency Services (DCS)
Distribution and Consistency Services (DCS) provide the underlying group services framework for the HAManager such that each application server process knows the health and status of JVMs and singleton services. It basically provides view synchronous services to the HAManager. DCS itself uses RMM as its reliable pub/sub message framework. RMM is an ultra high speed publish/subscribe system that WebSphere uses internally for its core group communication fabric as well as for DRS traffic.
Core group policy
A core group policy determines how many and which members of a high availability group are activated to accept work at any point of time. Each service or group can have its own policy. A single policy manages a set of high availability groups (HA groups) using a matching algorithm. A high availability group must be managed by exactly one policy.
Policies can be added, deleted or edited while the core group is running. These changes take effect immediately. There is no need to restart any JVMs in the core group for a policy change to take effect.
To create or edit a core group policy, click Servers -> Core groups -> Core group settings -> -> Policies -> New or .
There are five types of core group policies available:
All active policy
M of N policy
No operation policy
One of N policy
One of N Policy
Only one server activates the singleton service at a time under this policy. If a failure occurs, the HAManager will start the service on another server. Make sure that all external resources are available to all high availability group members at all times when using this policy. For example, if database access is required for messaging, all members should have the remote database catalogued. If there are transaction logs, they should be put on a Network Access Storage (NAS) that is available to all members. This is the recommended policy for systems that require automatic failovers and do not use external high availability software.
The one-of-N policy has the following additional options to cater for different usage scenarios:
You can specify an ordered list of servers that the HAManager observes when choosing where to run a singleton service.
Leave this checkbox unchecked.
This option is only needed for WebSphere Extended Deployment customers using partitioning and using hardware to enforce quorums. If you are however using WebSphere Partition Facility and the appropriate hardware, then please contact IBM support to help you configuring this setting correctly.
When enabled, a singleton service will be moved to a preferred server when one becomes available. One example usage is the Transaction Manager. When the failing server with the Transaction Manager becomes online again, it should re-acquire the Transaction Manager service as Transaction Manager failover only caters for recovery processing.
Preferred servers only
This option makes a singleton service to run exclusively on servers in the preferred servers list.
Two default one-of-N policies are defined for the DefaultCoreGroup: Clustered TM Policy for the high availability of a Transaction Manager and Default SIBus Policy for protecting the Service Integration Bus (messaging) services.
Important: The default policies should never be edited, changed or deleted. They can be overridden by new policies that have a more specific matchset.
No operation policy
Using this policy, the HAManager never activates a singleton service on its own. It is primarily intended to be used with an external clustering software, such as the IBM HACMP. The software controls where to activate a singleton service by invoking operations on the HAManager MBean.
Typically, this mode is used when overall system infrastructure dictates the singleton service to have dependencies on resources managed outside WebSphere. For example, Transaction Manager logs may be placed on a Journaled File System (JFS) that is present on a SAN disk. Only one server can mount a JFS file system at a time, even though it is on a shared disk.
Recovery time is significantly reduced than the cold standby model in previous versions of WebSphere Application Server since all JVMs are running before a failover event. The expensive JVM startup time is avoided during the critical failover time.
This policy should be used when you want the singleton service to run on a specific high availability group member. If the member is not online, the singleton service will not be running. The singleton service will not automatically failover. Manual intervention is required. The fixed member can be changed on the fly without restarting WebSphere. This option is useful when failover is undesirable. If the server then fails, the service can be moved to another server by updating the server name on the policy and saving it.
Every singleton service is managed by a high availability group to which a policy is assigned at runtime. The assignment is done by comparing the match criteria of the set of available policies against the properties of the high availability groups. The policy with the strongest match will be assigned to the HA group. You can edit a policy by clicking Servers -> Core groups -> Core group settings -> New or -> Policies -> New or -> Match Criteria.
Match criteria for messaging engines
Name Value Match targets
type WSAF_SIB All messaging engines
WSAF_SIB_MESSAGING_ENGINE Name of your messaging engine One particular messaging engine
WSAF_SIB_BUS Name of your bus All messaging engines in a bus
IBM_hc Name of your cluster All messaging engines in a cluster
Match criteria for Transaction Managers
Name Value Match targets
type WAS_TRANSACTIONS All Transaction Managers
IBM_hc Cluster name All Transaction Managers in a cluster
GN_PS Home server name One particular Transaction Manager
A transport type is the type of network communication a core group uses to communicate to its members. There are three types of transports available:
• Channel Framework
No matter which transport type is used, there is always a socket between each pair of JVMs for point-to-point messages and failure detection. For example, if you have a total of eight JVMs in your core group, then every JVM will have seven sockets to others.
Multicast: Multicast is a high performance protocol and the HAManager is designed to perform best in the multicast mode especially when using very large core groups. Typical scenarios for multicast are:
• There are over 70 JVMs in the core group
• Many JVMs are created on a small number of large SMPs
Publishing a message in this mode is efficient as the publish only has to transmit once. Only a fixed number of threads is used independent of the number of JVMs in a core group. However, consider the following factors to decide if multicast is suitable for your environment:
• Multicast typically requires all JVMs in the core group to be on the same subnet (TTL can be tuned, please contact IBM support for details).
• All members in a core group receive multicast messages. JVMs waste CPU cycles on discarding messages that are not intended for them.
Communications between JVMs are performed via the direct TCP sockets between each pair of JVMs under this transport mode. The unicast mode has the following advantages and disadvantages:
• Unicast is WAN friendly. There is no limitation to localize JVMs on a single subnet.
• Publishing a message which is intended only for a small number of servers is more effective than multicast. Servers that have no interest in the message will not waste CPU cycles on discarding messages.
• Only a fixed number of threads are used regardless of the number of JVMs.
• Publishing a message to a large number of servers is more expensive considering each message is sent once per destination JVM.
This default transport mode has similar pros and cons as unicast. It is more flexible than unicast in the sense that a core group in this transport is associated with a channel chain, and a chain can use HTTP tunneling, SSL or HTTPS. The performance of channel framework is around 50% less than unicast for tasks such as HTTP session replication. It is a trade-off option between performance and flexibility for different environments. SSL and HTTP tunneling are only available using the channel framework transport.
If the transport type in a core group is changed, all JVMs in that group must be restarted. The following is the recommended procedure for changing the transport type:
1. Stop all cluster members and Node Agents in the core group.
2. Modify the transport type using the WebSphere Administrative Console by clicking Servers -> Core groups -> Core group settings -> .
3. Change the Transport type setting on the Configuration tab.
4. Perform a manual synchronization using the command line utility syncNode.
5. Start all Node Agents.
6. Start all cluster members and application servers.
High availability group
High availability groups are dynamic components created from a core group. Each group represents a highly available singleton service. The active members in a group are ready to host the service at any time.
To view a list of high availability groups, click Servers -> Core groups -> Core group settings -> . Then select the Runtime tab. Specify a match criterion for a list of specific high availability group(s) or an asterisk (*) as a wildcard to get a complete list of groups. For example, specifying type=WAS_TRANSACTIONS results in a list of Transaction Manager high availability groups.
When you click Show groups, a list of high availability groups that match the criteria you specified is displayed as shown in Figure 9-7. Each high availability group is displayed along with its associated policy.
Select any high availability group and a list of group members is displayed as shown in Figure 9-8. Only running JVMs are displayed in the group member list. From here you can manage the group members by activating, deactivating, enabling or disabling them.
Discovery of core group members
A JVM that is starting in a core group goes through three stages before joining the group:
1. Not connected: The JVM has not established network connectivity with other group members. It will send a single announcement message if the multicast transport mode is used. Or it will send a message to each member of the group if unicast is used. It sends multiple messages in unicast because it doesn’t know which other members are started.
2. Connected: The JVM has already opened a stream to all current members of the installed view. The coordinator will consider this JVM as a candidate to join the view. A view is the set of online JVMs ready for running singleton services.
3. In a view: The JVM is a full participant in a core group at this stage. The view is updated and installed in all members.
When a new view is installed, message HMGR0218I is displayed in the SystemOut.log file of each JVM in the group indicating how many JVMs are currently a member of the view.
Example 9-5 Message HMGR0218I for a new view being installed
[10/27/04 18:52:20:781 EDT] 00000011 CoordinatorIm I HMGR0218I: A new core group view has been installed. The view identifier is (16:0.dmCell\app1Node\Ejb1). The number of members in the new view is 5.
JVMs in the current view constantly try to discover others that are not in the view. Each in-view JVM periodically tries to open sockets to JVMs that are not in the current view. This process continues until all JVMs in the core group are in the view.
Attention: When running a large number of JVMs on a single box, the FIN_WAIT parameter of the operating system may need to be tuned down to prevent running out of ephemeral ports. Please consult your OS documentation for details.
The HAManager monitors JVMs of core group members and initiates failover when necessary. It uses two methods to detect a process failure:
• Active failure detection
• TCP KEEP_ALIVE
Active failure detection
A JVM is marked as failed if its heartbeat signals to its core group peers are lost for a specified interval. The DCS sends heartbeats between every JVM pair in a view. With the default settings, heartbeats are sent every 10 seconds and 20 heartbeat signals must be lost before a JVM is raised as a suspect and a failover is initiated. The default failure detection time is therefore 200 seconds.
This setting is very high and should be modified by most customers in a production environment. A setting of 10-30 seconds is normally recommended for a well tuned cell.
When a JVM failure is detected, it is “suspected” by others in the view. This can be seen in the SystemOut.log. The new view installation in this case is fast in order to achieve fast recovery. New view installations are slower for new views generated from JVM startups. Otherwise, there would be frequent view installations when several JVMs are started together.
Heartbeat delivery can be delayed due to a number of commonly-seen system problems:
Swapping: When a system is swapping, the JVM could get paged and heartbeat signals are not sent or received in time.
Thread scheduling thrashing: Java is not a real time environment. When there are a lot of runnable threads accumulated in a system, each thread will suffer a long delay before getting scheduled. Threads of a JVM may not get scheduled to process heartbeat signals in a timely fashion. This thread scheduling problem also impacts the applications on that system as their response times will also be unacceptable. Therefore, systems must be tuned to avoid CPU starving or heavy paging.
Any of the above problems can cause instability in your high availability environment. After tuning the system not to suffer from swapping or thread thrashing, the heartbeat interval can be lowered to increase the sensitivity of failure detection.
Changing the frequency of active failure detection
Name Description Default value
IBM_CS_FD_PERIOD_SECS This is the interval between heartbeats in seconds. 10
IBM_CS_FD_CONSECUTIVE_MISSED This is the number of missed heartbeats to mark a server as a suspect. 20
Heartbeating is always enabled regardless of the message transport type for the HAManager.
Example 9-6 SystemOut.log
[11/8/04 15:34:26:489 EST] 00000017 RmmPtpGroup W DCSV1111W: DCS Stack DefaultCoreGroup at Member dmCell\app2Node\Ejb2b: Suspected another member because the outgoing connection from the other member was closed. Suspected members is dmCell\app2Node\Ejb2a. DCS logical channel is View|Ptp.
[11/8/04 15:34:26:547 EST] 00000017 DiscoveryRmmP W DCSV1111W: DCS Stack DefaultCoreGroup at Member dmCell\app2Node\Ejb2b: Suspected another member because the outgoing connection from the other member was closed. Suspected members is dmCell\app2Node\Ejb2a. DCS logical channel is Connected|Ptp.
[11/8/04 15:34:26:694 EST] 00000017 RmmPtpGroup W DCSV1111W: DCS Stack DefaultCoreGroup.DynamicCache at Member dmCell\app2Node\Ejb2b: Suspected another member because the outgoing connection from the other member was closed. Suspected members is dmCell\app2Node\Ejb2a. DCS logical channel is View|Ptp.
[11/8/04 15:34:26:886 EST] 00000017 DiscoveryRmmP W DCSV1111W: DCS Stack DefaultCoreGroup.DynamicCache at Member dmCell\app2Node\Ejb2b: Suspected another member because the outgoing connection from the other member was closed. Suspected members is dmCell\app2Node\Ejb2a. DCS logical channel is Connected|Ptp.
[11/8/04 15:34:27:344 EST] 00000017 DiscoveryServ W DCSV1111W: DCS Stack DefaultCoreGroup at Member dmCell\app2Node\Ejb2b: Suspected another member because the outgoing connection from the other member was closed. Suspected members is dmCell\app2Node\Ejb2a. DCS logical channel is Discovery|Ptp.
[11/8/04 15:34:27:831 EST] 00000017 VSync I DCSV2004I: DCS Stack DefaultCoreGroup at Member dmCell\app2Node\Ejb2b: The synchronization procedure completed successfully. The View Identifier is (139:0.dmCell\app1Node\Ejb1). The internal details are [0 0 0 0 0 0 0 0 0].
[11/8/04 15:34:28:151 EST] 00000017 VSync I DCSV2004I: DCS Stack DefaultCoreGroup.DynamicCache at Member dmCell\app2Node\Ejb2b: The synchronization procedure completed successfully. The View Identifier is (110:0.dmCell\app1Node\Ejb1). The internal details are [0 0 0 0 0 0].
[11/8/04 15:34:28:300 EST] 00000019 CoordinatorIm I HMGR0228I: The Coordinator is not an Active Coordinator for core group DefaultCoreGroup.
[11/8/04 15:34:28:425 EST] 00000019 CoordinatorIm I HMGR0218I: A new core group view has been installed. The view identifier is (140:0.dmCell\app1Node\Ejb1). The number of members in the new view is 8.
[11/8/04 15:34:28:448 EST] 00000019 CoreGroupMemb I DCSV8050I: DCS Stack DefaultCoreGroup at Member dmCell\app2Node\Ejb2b: New view installed, identifier (140:0.dmCell\app1Node\Ejb1), view size is 8 (AV=8, CD=8, CN=8, DF=11)
[11/8/04 15:34:28:701 EST] 00000017 ViewReceiver I DCSV1033I: DCS Stack DefaultCoreGroup at Member dmCell\app2Node\Ejb2b: Confirmed all new view members in view identifier (140:0.dmCell\app1Node\Ejb1). View channel type is View|Ptp.
[11/8/04 15:34:29:297 EST] 00000049 RecoveryDirec A WTRN0100E: Performing recovery processing for a peer WebSphere server (FileFailureScope: dmCell\app2Node\Ejb2a [-1788920684])
[11/8/04 15:34:29:324 EST] 00000049 RecoveryDirec A WTRN0100E: All persistant services have been directed to perform recovery processing for a peer WebSphere server (FileFailureScope: dmCell\app2Node\Ejb2a [-1788920684])
[11/8/04 15:34:29:601 EST] 00000049 RecoveryDirec A WTRN0100E: All persistant services have been directed to perform recovery processing for a peer WebSphere server (FileFailureScope: dmCell\app2Node\Ejb2a [-1788920684])
[11/8/04 15:34:29:771 EST] 0000004a RecoveryManag A WTRN0028I: Transaction service recovering 0 transactions.
[11/8/04 15:34:29:668 EST] 0000001c DataStackMemb I DCSV8050I: DCS Stack DefaultCoreGroup.DynamicCache at Member dmCell\app2Node\Ejb2b: New view installed, identifier (111:0.dmCell\app1Node\Ejb1), view size is 5 (AV=5, CD=5, CN=5, DF=6)
[11/8/04 15:34:29:895 EST] 0000001c RoleMember I DCSV8052I: DCS Stack DefaultCoreGroup.DynamicCache at Member dmCell\app2Node\Ejb2b: Defined set changed. Removed: [dmCell\app2Node\Ejb2a].
[11/8/04 15:34:31:500 EST] 00000019 HAManagerImpl I HMGR0123I: A GroupUpdate message was received for a group that does not exist. The group name is drs_agent_id=Ejb2a\baseCache01663\1,drs_inst_id=1100098001663,drs_inst_name=baseCache,drs_mode=0,policy=DefaultNOOPPolicy.
[11/8/04 15:34:31:561 EST] 00000017 ViewReceiver I DCSV1033I: DCS Stack DefaultCoreGroup.DynamicCache at Member dmCell\app2Node\Ejb2b: Confirmed all new view members in view identifier (111:0.dmCell\app1Node\Ejb1). View channel type is View|Ptp.
If a socket between two peers is closed then the side receiving the closed socket exception will signal its peers that the other JVM is to be regarded as failed. This means that if a JVM panics or exits then the failure is detected as quickly as the TCP implementation allows. If the failure is because of a power failure or a network failure then the socket will be closed after the period defined by the KEEP_ALIVE interval of the operating system. This is normally a long time and should be tuned to more realistic values in any WebSphere system. A long KEEP_ALIVE interval can cause many undesirable behaviors in a highly available WebSphere environment when systems fail (including database systems).
This failure detection method is however less prone to CPU/memory starvation from swapping or thrashing. Both failure detectors together offer a very reliable mechanism of failure detection.
Attention: The TCP KEEP_ALIVE value is a network setting of your operating system. Changing its value may have side-effects to other processes running in your system. Refer to Connection Timeout setting for details on how to overwrite this value in a WebSphere environment.
Transaction Manager high availability
The WebSphere transaction service writes to its transaction logs when WebSphere handles global transactions that involve two or more resources. Transaction logs are stored on disk and are used for recovering in-flight transactions from system crashes or process failures. As the transactions are recovered, any database locks related to the transactions will be released. If the recovery is somehow delayed, not only the transactions cannot be completed, database access may be impaired due to unreleased locks.
In the event of a server failure, the transaction service of the failed application server is out of service. Also, the in-flight transactions that have not be committed may leave locks in the database, which will block the surviving server from gaining access to locked records.