LOGARCHIVE - Man Page

Performance Co-Pilot archive formats

Description

Performance Co-Pilot (PCP) archives store historical values about arbitrary metrics recorded from a single host. Archives are machine independent and self-contained - all metric data and metadata required for off-line or off-site analysis is held within an archive.

The format is stable in order to allow long-term historical storage and processing by PMAPI(3) client tools. However some format variants are supported over time, and currently Versions 2 and 3 are supported. The mandate is that PCP will provide long-term backwards compatibility, so an archive created on any version of PCP can be read on that version of PCP and all subsequent versions of PCP. The exception is Version 1 that was retired in the PCP Version 2.0 release in May 1998.

Archives may be read by most PCP client tools, using the -a/--archive NAME option, or dumped raw by pmlogdump(1). Archives are created primarily by pmlogger(1), however they can also be created using the LOGIMPORT(3) programming interface.

Archives may be merged, analyzed, modified and subsampled using pmlogreduce(1), pmlogsummary(1), pmlogrewrite(1) and pmlogextract(1). In addition, PCP archives may be examined in sets or grouped together into “archive folios”, which are created and managed by the mkaf(1) and pmafm(1) tools.

An archive consists of several physical files that share a common arbitrary prefix, e.g. myarchive.

myarchive.0, myarchive.1, ...

One or more data volumes containing the metric values and any error codes encountered during metric sampling. Typically the largest of the files and may grow very rapidly, depending on the selection of metrics to be logged by pmlogger(1) and the sampling intervals being used.

myarchive.meta

Information for PMAPI functions such as pmLookupName(3), pmLookupDesc(3), pmLookupLabels(3) and pmLookupInDom(3). The metadata file may grow sporadically as logged metrics, instance domains and labels vary over time.

myarchive.index

A temporal index, mapping timestamps to byte offsets in the other files.

Common Features

All three types of files have a similar record-based structure, a convention of network byte-order (big-endian) encoding, and 32-bit fields for tagging/padding for those records. Strings are stored as 8-bit characters without assuming a specific encoding, so normally ASCII. See also the __pmLog* types in src/include/pcp/libpcp.h.

Record Framing

The volume and .meta files are divided into self-identifying records.

OffsetLengthName
04N, length of record, in bytes, including this field
4N-8record payload, usually starting with a 32-bit record type tag
N-44N, length of record (again)

Archive Label

All three types of files begin with an “archive label” header, which identifies the host name, starting timestamp and timezone information; all referring to the host that was the source of the performance data (which may be different to the host where pmlogger(1) was running).

The “archive label” format differs between Version 2 and Version 3, with the latter providing enhanced timestamps (64-bit encoding of the seconds part and nanosecond precision) and some additional fields.

Version 2
OffsetLengthName
04tag, PM_LOG_MAGIC | PM_LOG_VERS02=0x50052602
44process id (PID) of pmlogger process that wrote file
84archive start time, seconds part (past UNIX epoch)
124archive start time, microseconds part
164current archive volume number (or -1=.meta, -2=.index)
2064name of collection host
8040time zone string for collection host ($TZ environment variable)
Version 3
OffsetLengthName
04tag, PM_LOG_MAGIC | PM_LOG_VERS03=0x50052603
44PID of pmlogger process that wrote file
88archive start time, seconds part (past UNIX epoch)
164archive start time, nanoseconds part
204current archive volume number (or -1=.meta, -2=.index)
244archive feature bits
284reserved for future use
32256name of collection host
288256timezone string for collection host ($TZ environment variable), e.g. AEDT-11
544256timezone zoneinfo string for collection host, e.g. :Australia/Melbourne

The “archive feature bits” are intended to encode possible future extensions or differences to the on-disk structure or the the archive semantics. At this stage there are no such features, but if they are introduced at some point in the future, there will be associated PM_LOG_FEATURE_XXX macros added to the <pcp/pmapi.h> header file.

All fields, except for the “current archive volume number”, match for all files in a single PCP archive.

Archive Volume (.0, .1, ...) Records

pmResult

After the archive label record, an archive volume file contains one or more records, each providing metric values corresponding to the pmResult from one pmFetch(3) operation. The record size may vary according to number of metrics being fetched and the number of instances in the associated instance domains.

For Version 2 the file size is limited to 2GiB, due to storage of 32-bit byte offsets within the temporal index. For Version 3 the file size is limited to 8191PiB, due to storage of 62-bit byte offsets within the temporal index.

The pmResult format differs between Version 2 and Version 3, with the latter providing enhanced timestamps (64-bit encoding of the seconds part and nanosecond precision).

Version 2
OffsetLengthName
04timestamp, seconds part (past UNIX epoch)
44timestamp, microseconds part
84number of metrics with data following
12MpmValueSet #0
12+MNpmValueSet #1
12+M+N......
NOPXpmValueBlock #0
NOP+XYpmValueBlock #1
NOP+X+Y......
Version 3
OffsetLengthName
08timestamp, seconds part (past UNIX epoch)
84timestamp, nanoseconds part
124number of metrics with data following
16MpmValueSet #0
16+MNpmValueSet #1
16+M+N......
NOPXpmValueBlock #0
NOP+XYpmValueBlock #1
NOP+X+Y......

Records with a “number of metrics” equal to zero are “mark records”, and represent interruptions, missing data, or time discontinuities in logging.

pmValueSet

This subrecord represents the values for one metric at one point in time.

OffsetLengthName
04Performance Metrics Identifier (PMID)
44number of values
84value format, PM_VAL_INSITU=0 or PM_VAL_DPTR=1
12MpmValue #0
12+MNpmValue #1
12+M+N......

The metadata describing metrics is found in the .meta file where the entries are not timestamped, as the metadata is assumed to be unchanging throughout an archive.

pmValue

This subrecord represents one value for one instance of a metric at one point in time. It is a variant type, depending on the parent pmValueSet's value format field. This allows small numbers to be encoded compactly, but retain flexibility for larger or variable length data to be stored later in the pmResult record in a pmValueBlock subrecord.

OffsetLengthName
04internal instance identifier (or PM_IN_NULL=-1 for singular metrics)
44value (INSITU) or
offset in pmResult to our pmValueBlock (DPTR)

The metadata describing the instance domain for metrics is found in the .meta file. Since the numeric mappings may change during the lifetime of the logging session, it is important to match up the timestamp of the measurement record with the corresponding instance domain record. That is, the instance domain corresponding to a measurement at time T is the instance domain observation for the metric's instance domain with largest timestamp T' <= T.

pmValueBlock

Instances of this subrecord are placed at the end of the pmValueSet, after all the pmValue subrecords. If (and only if) needed, they are padded at the end to the next 32-bit boundary.

OffsetLengthName
01value type (same as pmDesc.type)
134 + N, the length of the subrecord
4Nbytes that make up the raw value
4+N0-3padding (not included in the 4+N length field)

Note that for PM_TYPE_STRING, the length includes an explicit NULL terminator byte. For PM_TYPE_EVENT, the value byte string is further structured. Refer to PMDAEVENTARRAY(3) for more information about how arrays of event records are packed inside a pmResult container.

METADATA FILE (.meta) RECORDS

After the archive label record, the metadata file contains interleaved metric description records, timestamped instance domain records, timestamped label records (for context, instance domain and metric labels) and (help) text records. Unlike the data volumes, these records are not forced to 32-bit alignment.

For Version 2 the file size is limited to 2GiB, due to storage of 32-bit byte offsets within the temporal index. For Version 3 the file size is limited to 8191PiB, due to storage of 62-bit byte offsets within the temporal index.

See also libpcp/src/logmeta.c.

Metric Descriptions

Instances of this (pmDesc) record provide the description or metadata for each metric appearing in the PCP archive. This metadata includes the metric's PMID, data type, data semantics, instance domain identifier (or PM_INDOM_NULL for singular metrics with only one value) and a set of (1 or more) names.

OffsetLengthName
04tag, TYPE_DESC=1
44PMID
84data type (PM_TYPE_*)
124instance domain identifier
164metric semantics (PM_SEM_*)
204units: bit-packed pmUnits
44number of alternative names for this PMID
284N: number of bytes in this name
32Nbytes of the name, no NULL terminator nor padding
32+N4N2: number of bytes in next name
36+NN2bytes of the name, no NULL terminator nor padding
.........

Instance Domains

A set-valued metric is defined over an instance domain, which consists of an instance domain identifier (will have already been mentioned in a prior pmDesc record), a count of the number of instances and a map that defines the association between internal instance identifiers (integers) and external instance names (strings).

Because instance domains can change over time, the instance domain also requires a timestamp, and the same instance domain can occur multiple times within the .meta file. The timestamps are used to search for the temporally correct instance domain when decoding pmResult records from the archive data volumes, or answering metadata queries against the instance domain.

The instance domain format differs markedly between Version 2 and Version 3. Version 3 provides enhanced timestamps (64-bit encoding of the seconds part and nanosecond precision) and introduces a new “delta” instance domain format that encodes differences between the previous observation of the instance domain and the current state of the instance domain.

Full Instance Domain - Version 2
OffsetLengthName
04tag, TYPE_INDOM_V2=2
44timestamp, seconds part (past UNIX epoch)
84timestamp, microseconds part
124instance domain number
164N: number of instances in domain, normally >0
204first instance number
244second instance number (if appropriate)
.........
20+4*N4first offset into string table (see below)
20+4*N+44second offset into string table (etc.)
.........
20+8*NMbase of string table, containing
packed, NULL-terminated instance names
Full Instance Domain - Version 3
OffsetLengthName
04tag, TYPE_INDOM=5
48timestamp, seconds part (past UNIX epoch)
124timestamp, nanoseconds part
164instance domain number
204N: number of instances in domain, normally >0
244first instance number
284second instance number (if appropriate)
.........
24+4*N4first offset into string table (see below)
24+4*N+44second offset into string table (etc.)
.........
24+8*NMbase of string table, containing
packed, NULL-terminated instance names

The “delta” instance domain record in Version 3 uses the same physical structure as the “full” instance domain above with the following differences:

  • The tag is TYPE_INDOM_DELTA=6.
  • The “number of instances in domain” field becomes the sum of the number of instances added and the number of instances deleted.
  • Deleted instances are encoded with the string offset set to -1 and there is no corresponding string table entry.
  • Added instances are encoded exactly the same way.

The “delta” instance domain format is used to provide a more compact on-disk encoding for instance domains that have a large number of instances and are subject to frequent small changes, e.g. the instance domain of process ids, as exported by pmdaproc(1).

For “full” instance domain records the instance domain replace the previous instance domain: prior records are not searched for instance domain metadata queries after this timestamp.

Each instance domain in a Version 3 archive must have an initial “full” instance domain record. Subsequent records for the same instance domain can be the `full'' or the “delta” variant. Any instance mentioned in the prior observation of an instance domain that is not mentioned in the “delta” instance domain record is assumed to continue to exist for the current observation of the instance domain.

Labels for Contexts, Instance Domains and Metrics

Instances of this (pmLogLabelSet) record provide sets of label-name:label-value pairs associated with labels of the context, instance domains and individual performance metrics - refer to pmLookupLabels(3) for further details.

Any instance domain identifier will have already been mentioned in a prior pmDesc record.

As new labels can appear during an archiving session, these records are timestamped and must be searched when decoding pmResult records from the archive data volumes. The pmLogLabelSet format differs between Version 2 and Version 3, with the latter providing enhanced timestamps (64-bit encoding of the seconds part and nanosecond precision).

Version 2
OffsetLengthName
04tag, TYPE_LABEL_V2=3
44timestamp, seconds part (past UNIX epoch)
84timestamp, microseconds part
124label type (PM_LABEL_* type macros.)
164numeric identifier - domain, PMID, etc or PM_IN_NULL=-1 for context labels
204N: number of label sets in this record, usually 1 except in the case of instances
244offset to the start of the JSONB labels string
28L1first labelset array entry (see below)
.........
28+L1LNN-th labelset array entry (see below)
.........
28+L1+...LNMconcatenated JSONB strings for all labelsets
Version 3
OffsetLengthName
04tag, TYPE_LABEL=7
48timestamp, seconds part (past UNIX epoch)
124timestamp, nanoseconds part
164label type (PM_LABEL_* type macros.)
204numeric identifier - domain, PMID, etc or PM_IN_NULL=-1 for context labels
244N: number of label sets in this record, usually 1 except in the case of instances
284offset to the start of the JSONB labels string
32L1first labelset array entry (see below)
.........
32+L1LNN-th labelset array entry (see below)
.........
32+L1+...LNMconcatenated JSONB strings for all labelsets

Records of this form replace the existing labels for a given label type: prior records are not searched for resolving that class of label in measurements after this timestamp.

The individual labelset array entries are variable length, depending on the number of labels present within that set. These entries contain the instance identifiers (in the case of type PM_LABEL_INSTANCES labels), lengths and offsets of each label name and value, and also any flags set for each label.

OffsetLengthName
04instance identifier (or PM_IN_NULL=-1)
44length of JSONB label string
84N: number of labels in this labelset
122first label name offset
141first label name length
151first label flags (e.g. optionality)
162first label value offset
182first label value length
202second label name offset (if appropriate)
.........

Help Text

This (pmLogText) record stores help text associated with a metric or an instance domain - as provided by pmLookupText(3) and pmLookupInDomText(3).

The metric identifier and instance domain identifier will have already been mentioned in a prior pmDesc record.

OffsetLengthName
04tag, TYPE_TEXT=4
44text and identifier type (PM_TEXT_* macros.)
84numeric identifier - PMID or instance domain
12Mhelp text string, arbitrary text

INDEX FILE (.index) RECORDS

After the archive label record, the temporal index file contains a plainly concatenated, unframed group of tuples, which relate timestamps to the byte offsets in the volume and .meta files. These records are fixed size, fixed format, and are not enclosed in the standard length/payload/length wrapper: they take up the entire remainder of the .index file after the archive label record.

The temporal index file provides a rapid way of seeking to a particular point of time within an archive for both the performance metric values and the associated metadata.

See also libpcp/src/logutil.c.

The index format differs between Version 2 and Version 3, with the latter providing enhanced timestamps (64-bit encoding of the seconds part and nanosecond precision) and 64-bit byte offsets.

Version 2
OffsetLengthName
04timestamp, seconds part (past UNIX epoch)
44timestamp, microseconds part
84archive volume number (0...N)
124byte offset in .meta file
164byte offset in archive volume file
Version 3
OffsetLengthName
08timestamp, seconds part (past UNIX epoch)
84timestamp, nanoseconds part
124archive volume number (0...N)
168byte offset in .meta file
248byte offset in archive volume file

Since the temporal index is optional, and exists only to speed up time-based random access to metrics and their metadata, the index records are emitted only intermittently. An archive reader program should not presume any particular rate of data flow into the index. However, common events that may trigger a new temporal index record include changes in instance domains, switching over to a new archive volume, and starting or stopping logging. One reliable invariant however is that, for each index entry, there are to be no meta or archive volume records with a timestamp after that in the index, but physically before the associated byte offset in the index.

Files

Several PCP tools create archives in standard locations:

$HOME/.pcp/pmlogger

default location for the interactive chart recording mode in pmchart(1)

$PCP_LOG_DIR/pmlogger

default location for pmlogger_daily(1) and pmlogger_check(1) scripts

See Also

mkaf(1), PCPIntro(1), pmafm(1), pmchart(1), pmdaproc(1), pmlogdump(1), pmlogger(1), pmlogger_check(1), pmlogger_daily(1), pmlogreduce(1), pmlogrewrite(1), pmlogsummary(1), LOGIMPORT(3), PMAPI(3), pmLookupDesc(3), pmLookupInDom(3), pmLookupInDomText(3), pmLookupLabels(3), pmLookupName(3), pmLookupText(3), pcp.conf(5) and pcp.env(5).

Referenced By

pcp2arrow(1), pcp2elasticsearch(1), pcp2graphite(1), pcp2influxdb(1), pcp2json(1), pcp2openmetrics(1), pcp2spark(1), pcp2xlsx(1), pcp2xml(1), pcp2zabbix(1), PCPIntro(1), pmGetArchiveLabel(3), pmlogger(1), pmlogmv(1), pmlogrewrite(1), pmrep(1).

Performance Co-Pilot