ipmctl-start-diagnostic - Man Page
Starts a diagnostic test
Synopsis
ipmctl start [OPTIONS] -diagnostic [TARGETS]
Description
Starts a diagnostic test.
Options
- -h, -help
Displays help for the command.
- -ddrt
Used to specify DDRT as the desired transport protocol for the current invocation of ipmctl.
- -smbus
Used to specify SMBUS as the desired transport protocol for the current invocation of ipmctl.
NoteThe -ddrt and -smbus options are mutually exclusive and may not be used together.
- -lpmb
Used to specify large transport payload size for the current invocation of ipmctl.
- -spmb
Used to specify small transport payload size for the current invocation of ipmctl.
NoteThe -lpmb and -spmb options are mutually exclusive and may not be used together.
- -o (text|nvmxml), -output (text|nvmxml)
Changes the output format. One of: "text" (default) or "nvmxml".
Targets
- -diagnostic [Quick|Config|Security|FW]
Start a specific test by supplying its name. All tests are run by default. One of:
- "Quick" - This test verifies that the PMem module host mailbox is accessible and that basic health indicators can be read and are currently reporting acceptable values.
- "Config" - This test verifies that the BIOS platform configuration matches the installed hardware and the platform configuration conform to best known practices.
- "Security" - This test verifies that all PMem modules have a consistent security state. It is a best practice to enable security on all PMem modules rather than just some.
- "FW" - This test verifies that all PMem modules of a given model have consistent FW installed and other FW modifiable attributes are set in accordance with best practices.
Note that the test does not have a means of verifying that the installed FW is the optimal version for a given PMem module model just that it has been consistently applied across the system.
- -dimm [DimmIDS]
Starts a diagnostic test on specific PMem modules by optionally supplying one or more comma separated PMem module identifiers. The default is to start the specified tests on all manageable PMem modules. Only valid for the Quick diagnostic test.
Examples
Starts all diagnostics.
ipmctl start -diagnostic
Starts the quick check diagnostic on PMem module 0x0001.
ipmctl start -diagnostic Quick -dimm 0x0001
Limitations
If a PMem module is unmanageable, then Quick test will report the reason, while Config, Security and FW tests will skip unmanageable PMem modules.
Return Data
Each diagnostic generates one or more log messages. A successful test generates a single log message per PMem module indicating that no errors were found. A failed test might generate multiple log messages each highlighting a specific error with all the relevant details. Each log contains the following information.
- Test
The test name along with overall execution result. One of:
- "Quick"
- "Config"
- "Security"
- "FW"
- State
The collective result state for each test. One of:
- "Ok"
- "Warning"
- "Failed"
- "Aborted"
- Message
The message indicates the status of the test. One of:
- "Ok"
- "Failed"
- SubTestName
The subtest name for given Test.
Test Name | Valid SubTest Names |
Quick |
|
Config |
|
Security |
|
FW |
|
- State
The severity of the error for each sub-test displayed with SubTestName. One of:
- "Ok"
- "Warning"
- "Failed"
- "Aborted"
Events are generated as a result of invoking the Start Diagnostics command in order to analyze the Intel™ Optane™ PMem module for potential issues.
Diagnostic events may fall into the following categories:
- Quick health diagnostic test event
- Platform configuration diagnostic test event
- Security diagnostic test event
- Firmware consistency and settings diagnostic test event
Each event includes the following pieces of information:
The severity of the event that occurred. One of:
- Informational (Info)
- Warning (Warning)
- Error (Failed)
- Aborted (Aborted)
- A unique ID of the item (PMem module UUID, DimmID, NamespaceID, RegionID, etc.) the event refers to.
- A detailed description of the event in English.
The following sections list each of the possible events grouped by category of the event.
Quick Health Check Events
The quick health check diagnostic verifies that the Intel™ Optane™ PMem module’s host mailboxes are accessible and that basic health indicators can be read and are currently reporting acceptable values.
Table 1. Table Quick Health Check Events
Code | Severity | Message | Arguments |
500 | Info | The quick health check succeeded. | |
501 | Warning | The quick health check detected that PMem module [1] is not manageable because subsystem vendor ID [2] is not supported. UID: [3] |
|
502 | Warning | The quick health check detected that PMem module [1] is not manageable because subsystem device ID [2] is not supported. UID: [3] |
|
503 | Warning | The quick health check detected that PMem module [1] is not manageable because firmware API version [2] is not supported. UID: [3] |
|
504 | Warning | The quick health check detected that PMem module [1] is reporting a bad health state [2]. UID: [3] |
|
505 | Warning | The quick health check detected that PMem module [1] is reporting a media temperature of [2] C which is above the alarm threshold [3] C. UID: [4] |
|
506 | Warning | The quick health check detected that PMem module [1] is reporting percentage remaining at [2]% which is less than the alarm threshold [3]%. UID: [4] |
|
507 | Warning | The quick health check detected that PMem module [1] is reporting reboot required. UID: [2] |
|
511 | Warning | The quick health check detected that PMem module [1] is reporting a controller temperature of [2] C which is above the alarm threshold [3] C. UID: [4] |
|
513 | Error | The quick health check detected that the boot status register of PMem module [1] is not readable. UID: [2] |
|
514 | Error | The quick health check detected that the firmware on PMem module [1] is reporting that the media is not ready. UID: [2] |
|
515 | Error | The quick health check detected that the firmware on PMem module [1] is reporting an error in the media. UID: [2] |
|
519 | Error | The quick health check detected that PMem module [1] failed to initialize BIOS POST testing. UID: [2] |
|
520 | Error | The quick health check detected that the firmware on PMem module [1] has not initialized successfully. The last known Major:Minor Checkpoint is [2]. UID: [3] |
|
523 | Error | The quick health check detected that PMem module [1] is reporting a viral state. The PMem module is now read-only. UID: [2] |
|
529 | Warning | The quick health check detected that PMem module [1] is reporting that it has no package spares available. UID: [2] |
|
530 | Info | The quick health check detected that the firmware on PMem module [1] experienced an unsafe shutdown before its latest restart. UID: [2] |
|
533 | Error | The quick health check detected that the firmware on PMem module [1] is reporting that the AIT DRAM is not ready. UID: [2] |
|
534 | Error | The quick health check detected that the firmware on PMem module [1] is reporting that the media is disabled. UID: [2] |
|
535 | Error | The quick health check detected that the firmware on PMem module [1] is reporting that the AIT DRAM is disabled. UID: [2] |
|
536 | Error | The quick health check detected that the firmware on PMem module [1] failed to load successfully. UID: [2] |
|
538 | Error | PMem module [1] is reporting that the DDRT IO Init is not complete. UID: [2] |
|
539 | Error | PMem module [1] is reporting that the mailbox interface is not ready. UID: [2] |
|
540 | Error | An internal error caused the quick health check to abort on PMem module [1]. UID: [2] |
|
541 | Error | The quick health check detected that PMem module [1] is busy. UID: [2] |
|
542 | Error | The quick health check detected that the platform FW did not map a region to SPA on PMem module [1]. ACPI NFIT NVPMem module State Flags Error Bit 6 Set. UID: [2] |
|
543 | Error | The quick health check detected that PMem module [1] DDRT Training is not complete/failed. UID: [2] |
|
544 | Error | PMem module [1] is reporting that the DDRT IO Init is not started. UID: [2] |
|
545 | Error | The quick health check detected that the ROM on PMem module [1] has failed to complete initialization, last known Major:Minor Checkpoint is [2]. |
|
Platform Configuration Check Events
This diagnostic test group verifies that the BIOS platform configuration matches the installed hardware and the platform configuration conforms to best known practices.
Table 2. Table Platform Configuration Check Events
Code | Severity | Message | Arguments |
600 | Info | The platform configuration check succeeded. | |
601 | Info | The platform configuration check detected that there are no manageable PMem modules. | |
606 | Info | The platform configuration check detected that PMem module [1] is not configured. UID: [2] |
|
608 | Error | The platform configuration check detected [1] PMem modules installed on the platform with the same serial number [2]. |
|
609 | Info | The platform configuration check detected that PMem module [1] has a goal configuration that has not yet been applied. A system reboot is required for the new configuration to take effect. UID: [2] |
|
618 | Error | The platform configuration check detected that a PMem module with physical ID [1] is present in the system but failed to initialize. UID: [2] |
|
621 | Error | The platform configuration check detected PCD contains invalid data on PMem module [1]. UID: [2] |
|
622 | Error | The platform configuration check was unable to retrieve the namespace information. | |
623 | Warning | The platform configuration check detected that the BIOS settings do not currently allow memory provisioning from this software. | |
624 | Error | The platform configuration check detected that the BIOS could not apply the configuration goal on PMem module [1] because of errors in the goal data. The detailed status is COUT table status: [2] [3], Partition change table status: [4], Interleave change table 1 status: [5], Interleave change table 2 status: [6]. |
|
625 | Error | The platform configuration check detected that the BIOS could not apply the configuration goal on PMem module [1] because the system has insufficient resources. The detailed status is COUT table status: [2] [3], Partition change table status: [4], Interleave change table 1 status: [5], Interleave change table 2 status: [6]. |
|
626 | Error | The platform configuration check detected that the BIOS could not apply the configuration goal on PMem module [1] because of a firmware error. The detailed status is COUT table status: [2] [3], Partition change table status: [4], Interleave change table 1 status: [5], Interleave change table 2 status: [6]. |
|
627 | Error | The platform configuration check detected that the BIOS could not apply the configuration goal on PMem module [1] for an unknown reason. The detailed status is COUT table status: [2] [3], Partition change table status: [4], Interleave change table 1 status: [5], Interleave change table 2 status: [6]. |
|
628 | Error | The platform configuration check detected that interleave set [1] is broken because the PMem modules were moved [2]. |
|
629 | Error | The platform configuration check detected that the platform does not support ADR and therefore data integrity is not guaranteed on the PMem modules. | |
630 | Error | An internal error caused the platform configuration check to abort. | |
631 | Error | The platform configuration check detected that interleave set [1] is broken because the PMem module with UID: [2] is missing from location (Socket-Die-iMC-Channel-Slot) [3]. |
|
632 | Error | The platform configuration check detected that interleave set [1] is broken because the PMem module with UID: [2] is misplaced. It is currently in location (Socket-Die-iMC-Channel-Slot) [3] and should be moved to (Socket-Die-iMC-Channel-Slot) [4]. |
|
633 | Error | The platform configuration check detected that the BIOS could not fully map memory on PMem module [1] because of an error in current configuration. The detailed status is CCUR table status: [2] [3]. |
|
Security Check Events
The security check diagnostic test group verifies that all Intel™ Optane™ PMem modules have a consistent security state.
Table 3. Table Security Check Events
Code | Severity | Message | Arguments |
800 | Info | The security check succeeded. | |
801 | Info | The security check detected that there are no manageable PMem modules. | |
802 | Warning | The security check detected that security settings are inconsistent [1]. |
|
804 | Info | The security check detected that security is not supported on all PMem modules. | |
805 | Error | An internal error caused the security check to abort. |
Firmware Consistency and Settings Check Events
This test group verifies that all PMem modules of a given subsystem device ID have consistent FW installed and other FW modifiable attributes are set in accordance with best practices.
Table 4. Table Firmware Consistency and Settings Check Events
Code | Severity | Message | Arguments |
900 | Info | The firmware consistency and settings check succeeded. | |
901 | Info | The firmware consistency and settings check detected that there are no manageable PMem modules. | |
902 | Warning | The firmware consistency and settings check detected that firmware version on PMem modules [1] with subsystem device ID [2] is non-optimal, preferred version is [3]. |
|
903 | Warning | The firmware consistency and settings check detected that PMem module [1] is reporting a non-critical media temperature threshold of [2] C which is above the fatal threshold [3] C. UID: [4] |
|
904 | Warning | The firmware consistency and settings check detected that PMem module [1] is reporting a non-critical controller temperature threshold of [2] C which is above the fatal threshold [3] C. UID: [4] |
|
905 | Warning | The firmware consistency and settings check detected that PMem module [1] is reporting a percentage remaining of [2]% which is below the recommended threshold [3]%. UID: [4] |
|
906 | Warning | The firmware consistency and settings check detected that PMem modules have inconsistent viral policy settings. | |
910 | Error | An internal error caused the firmware consistency and settings check to abort. | |
911 | Warning | The firmware consistency and settings check detected that PMem modules have inconsistent first fast refresh settings. |