ethfabricanalysis - Man Page
Name
ethfabricanalysis
Performs analysis of the fabric.
Syntax
ethfabricanalysis [-b|-e] [-s] [-d dir] [-c file] [-E file] [-p planes] [-T topology_inputs] [-f host_files]
Options
- --help
Produces full help text.
- -b
Specifies the baseline mode. Default is compare/check mode.
- -e
Evaluates health only. Default is compare/check mode.
- -s
Saves history of failures (errors/differences).
- -d dir
Specifies the top-level directory for saving baseline and history of failed checks. Default is /var/usr/lib/eth-tools/analysis
- -c file
Specifies the error thresholds config file. Default is /etc/eth-tools/ethmon.conf
- -E file
Specifies Ethernet Mgt configuration file. The default is /etc/eth-tools/mgt_config.xml.
- -p planes
Specifies Fabric planes separated by space. The default is the first enabled plane defined in config file. Value 'ALL' will use all enabled planes.
- -f host_files
Hosts files separated by space. It overrides the HostsFiles defined in Mgt config file for the corresponding planes. Value 'DEFAULT' will use the HostFile defined in Mgt config file for the corresponding plane
- -T topology_inputs
Specifies the name of topology input filenames separated by space. See Details and ethreport for more information.
Example
ethfabricanalysis
ethfabricanalysis -p 'p1 p2' -f 'hosts1 DEFAULT'
The fabric analysis tool checks the following:
- Fabric links (both internal to switch and external cables)
- Fabric components (nodes, links, systems, and their configuration)
- Fabric error counters and link speed mismatches
NOTE: The comparison includes components on the fabric. Therefore, operations such as shutting down a server cause the server to no longer appear on the fabric and are flagged as a fabric change or failure by ethfabricanalysis.
Environment Variables
The following environment variables are also used by this command:
- FF_ANALYSIS_DIR
Top-level directory for baselines and failed health checks.
Details
You can specify the topology_input file to be used with one of the following methods:
- On the command line using the -T option.
- Using the TopologyFile specified in Ethernet Mgt config file.
If the specified file does not exist, no topology_input file is used.
For more information on topology_input, refer to ethreport
By default, the error analysis includes counters and slow links (that is, links running below enabled speeds). You can change this using the FF_FABRIC_HEALTH configuration parameter in ethfastfabric.conf. This parameter specifies the ethreport options and reports to be used for the health analysis.
When a topology_input file is used, it can also be useful to extend FF_FABRIC_HEALTH to include fabric topology verification options such as -o verifylinks.
The thresholds for counter analysis default to /etc/eth-tools/ethmon.conf. However, you can specify an alternate configuration file for thresholds using the -c option. The ethmon.si.conf file can also be used to check for any non-zero values for signal integrity counters.
All files generated by ethfabricanalysis start with fabric in their file name.
The ethfabricanalysis tool generates files such as the following within FF_ANALYSIS_DIR :
Health Check
- latest/fabric.<plane_name>.errors stdout of ethreport for errors encountered during fabric error analysis.
- latest/fabric.<plane_name>.errors.stderr stderr of ethreport during fabric error analysis.
Baseline
During a baseline run, the following files are also created in FF_ANALYSIS_DIR/latest.
- baseline/fabric.<plane_name>.snapshot.xml ethreport snapshot of complete fabric components and configuration.
- baseline/fabric.<plane_name>.comps ethreport summary of fabric components and basic configuration.
- baseline/fabric.<plane_name>.links ethreport summary of internal and external links.
Full Analysis
- latest/fabric.<plane_name>.snapshot.xml ethreport snapshot of complete fabric components and configuration.
- latest/fabric.<plane_name>.snapshot.stderr stderr of ethreport during snapshot.
- latest/fabric.<plane_name>.errors stdout of ethreport for errors encountered during fabric error analysis.
- latest/fabric.<plane_name>.errors.stderr stderr of ethreport during fabric error analysis.
- latest/fabric.<plane_name>.comps stdout of ethreport for fabric components and configuration.
- latest/fabric.<plane_name>.comps.stderr stderr of ethreport for fabric components.
- latest/fabric.<plane_name>.comps.diff diff of baseline and latest fabric components.
- latest/fabric.<plane_name>.links stdout of ethreport summary of internal and external links.
- latest/fabric.<plane_name>.links.stderr stderr of ethreport summary of internal and external links.
- latest/fabric.<plane_name>.links.diff diff of baseline and latest fabric internal and external links.
- latest/fabric.<plane_name>.links.changes.stderr stderr of ethreport comparison of links.
- latest/fabric.<plane_name>.links.changes ethreport comparison of links against baseline. This is typically easier to read than the links.diff file and contains the same information.
- latest/fabric.<plane_name>.comps.changes.stderr stderr of ethreport comparison of components.
- latest/fabric.<plane_name>.comps.changes ethreport comparison of components against baseline. This is typically easier to read than the comps.diff file and contains the same information.
The .diff and .changes files are only created if differences are detected.
If the -s option is used and failures are detected, files related to the checks that failed are also copied to the time-stamped directory name under FF_ANALYSIS_DIR.
Fabric Items Checked Against the Baseline
Based on ethreport -o links:
- Unconnected/down/missing cables
- Added/moved cables
- Changes in link width and speed
- Changes to IfAddr in fabric (replacement of NIC or Switch hardware)
- Adding/Removing Nodes (NIC, Virtual NICs, Virtual Switches, Physical Switches, Physical Switch internal switching cards (leaf/spine))
- Changes to server or switch names
Based on ethreport -o comps:
- Overlap with items from links report
- Changes in port MTU
- Changes in port speed/width enabled or supported
- Changes in NIC or switch device IDs/revisions/VendorID (for example, ASIC hardware changes)
- Changes in port Capability mask (which features/agents run on port/server)
- Changes to I/O Units (IOUs), I/O Controllers (IOCs), and I/O Controller Services Services provided
Fabric Items Also Checked During Health Check
Based on ethreport -s -o errors -o slowlinks:
- Error counters on all Intel(R) Ethernet Fabric ports (NIC, switch external, and switch internal) checked against configurable thresholds.
- Typically identifies potential fabric errors, such as symbol errors.
- May also identify transient congestion, depending on the counters that are monitored.
- Link active speed/width as compared to Enabled speed.
- Identifies links whose active speed/width is < min (enabled speed/width on each side of link).
- This typically reflects bad cables or bad ports or poor connections.
- Side effect is the verification of fabric health.