ocf_heartbeat_ethmonitor - Man Page
Monitors network interfaces
Synopsis
ethmonitor [start | stop | status | monitor | meta-data | validate-all]
Description
Monitor the vitality of a local network interface.
You may set up this RA as a clone resource to monitor the network interfaces on different nodes, with the same interface name. This is not related to the IP address or the network on which a interface is configured. You may use this RA to move resources away from a node, which has a faulty interface or prevent moving resources to such a node. This gives you independent control of the resources, without involving cluster intercommunication. But it requires your nodes to have more than one network interface.
The resource configuration requires a monitor operation, because the monitor does the main part of the work. In addition to the resource configuration, you need to configure some location constraints, based on a CIB attribute value. The name of the attribute value is configured in the 'name' option of this RA.
Example constraint configuration using crmsh location loc_connected_node my_resource_grp rule ="rule_loc_connected_node" -INF: ethmonitor eq 0
Example constraint configuration using pcs. Only allow 'my_resource' to run on nodes where eth0 ethernet device is available. pcs constraint location my_resource rule score=-INFINITY ethmonitor-eth0 ne 1
The ethmonitor works in 3 different modes to test the interface vitality. 1. call ip to see if the link status is up (if link is down -> error) 2. call ip and watch the RX counter (if packages come around in a certain time -> success) 3. call arping to check whether any of the IPs found in the local ARP cache answers an ARP REQUEST (one answer -> success) 4. return error
Supported Parameters
- interface
The name of the network interface which should be monitored (e.g. eth0).
(unique, required, string, no default)
- name
The name of the CIB attribute to set. This is the name to be used in the constraints. Defaults to "ethmonitor-'interface_name'".
(unique, optional, string, no default)
- multiplier
Multiplier for the value of the CIB attriobute specified in parameter name.
(optional, integer, default 1)
- repeat_count
Specify how often the interface will be monitored, before the status is set to failed. You need to set the timeout of the monitoring operation to at least repeat_count * repeat_interval
(optional, integer, default 5)
- repeat_interval
Specify how long to wait in seconds between the repeat_counts.
(optional, integer, default 10)
- pktcnt_timeout
Timeout for the RX packet counter. Stop listening for packet counter changes after the given number of seconds.
(optional, integer, default 5)
- arping_count
Number of ARP REQUEST packets to send for every IP. Usually one ARP REQUEST (arping) is send
(optional, integer, default 1)
- arping_timeout
Time in seconds to wait for ARP REQUESTs (all packets of arping_count). This is to limit the time for arp requests, to be able to send requests to more than one node, without running in the monitor operation timeout.
(optional, integer, default 1)
- arping_cache_entries
Maximum number of IPs from ARP cache list to check for ARP REQUEST (arping) answers. Newest entries are tried first.
(optional, integer, default 5)
- infiniband_device
For interfaces that are infiniband devices.
(optional, string, no default)
- infiniband_port
For infiniband devices, this is the port to monitor.
(optional, integer, no default)
- link_status_only
Only report success based on link status. Do not perform RX counter or arping related connectivity tests.
(optional, boolean, default false)
Supported Actions
This resource agent supports the following actions (operations):
- start
Starts the resource. Suggested minimum timeout: 60s.
- stop
Stops the resource. Suggested minimum timeout: 20s.
- status
Performs a status check. Suggested minimum timeout: 60s. Suggested interval: 10s.
- monitor
Performs a detailed status check. Suggested minimum timeout: 60s. Suggested interval: 10s.
- meta-data
Retrieves resource agent metadata (internal use only). Suggested minimum timeout: 5s.
- validate-all
Performs a validation of the resource configuration. Suggested minimum timeout: 20s.
Example CRM Shell
The following is an example configuration for a ethmonitor resource using the crm(8) shell:
primitive p_ethmonitor ocf:heartbeat:ethmonitor \ params \ interface=string \ op monitor depth="0" timeout="60s" interval="10s"
Example PCS
The following is an example configuration for a ethmonitor resource using pcs(8)
pcs resource create p_ethmonitor ocf:heartbeat:ethmonitor \ interface=string \ op monitor OCF_CHECK_LEVEL="0" timeout="60s" interval="10s"
See Also
Author
ClusterLabs contributors (see the resource agent source for information about individual authors)