hdfscli-avro - Man Page

hdfscli-avro – an Avro extension for HdfsCLI

Synopsis

hdfscli-avro schema [-a ALIAS] [-v...] HDFS_PATH

hdfscli-avro read [-a ALIAS] [-v...] [-F FREQ | -n NUM] [-p PARTS] HDFS_PATH

hdfscli write [-fa ALIAS] [-v...] [-C CODEC] [-S SCHEMA] HDFS_PATH

hdfscli-avro -L | -h

Options

Commands

schema

Pretty print schema.

read

Read an Avro file from HDFS and output records as JSON to standard out.

write

Read JSON records from standard in and serialize them into a single Avro file on HDFS.

Arguments

HDFS_PATH

Remote path to Avro file or directory containing Avro part-files.

Options

-C CODEC --codec=CODEC

Compression codec. Available values are among: null, deflate, snappy. [default: deflate]

-F FREQ --freq=FREQ

Probability of sampling a record.

-L --log

Show path to current log file and exit.

-S SCHEMA --schema=SCHEMA

Schema for serializing records. If not passed, it will be inferred from the first record.

-a ALIAS--alias=ALIAS

Alias of namenode to connect to.

-f --force

Overwrite any existing file.

-h --help

Show a usage message and exit.

-n NUM--num=NUM

Cap number of records to output.

-p PARTS--parts=PARTS

Part-files to read. Specify a number to randomly select that many, or a comma-separated list of numbers to read only these. Use a number followed by a comma (e.g. 1,) to get a unique part-file. The default is to read all part-files.

-v --verbose

Enable log output. Can be specified up to three times (increasing verbosity each time).

Examples

hdfscli-avro schema /data/impressions.avro
hdfscli-avro read -a dev snapshot.avro >snapshot.jsonl
hdfscli-avro read -F 0.1 -p 2,3 clicks.avro
hdfscli-avro write -f positives.avro <positives.jsonl -S "$(cat schema.avsc)"

See Also

hdfscli(1)

Referenced By

hdfscli(1).

October 2021