hdfscli-avro - Man Page
hdfscli-avro – an Avro extension for HdfsCLI
Synopsis
hdfscli-avro schema [-a ALIAS] [-v...] HDFS_PATH
hdfscli-avro read [-a ALIAS] [-v...] [-F FREQ | -n NUM] [-p PARTS] HDFS_PATH
hdfscli write [-fa ALIAS] [-v...] [-C CODEC] [-S SCHEMA] HDFS_PATH
Options
Commands
- schema
Pretty print schema.
- read
Read an Avro file from HDFS and output records as JSON to standard out.
- write
Read JSON records from standard in and serialize them into a single Avro file on HDFS.
Arguments
- HDFS_PATH
Remote path to Avro file or directory containing Avro part-files.
Options
- -C CODEC --codec=CODEC
Compression codec. Available values are among: null, deflate, snappy. [default: deflate]
- -F FREQ --freq=FREQ
Probability of sampling a record.
- -L --log
Show path to current log file and exit.
- -S SCHEMA --schema=SCHEMA
Schema for serializing records. If not passed, it will be inferred from the first record.
- -a ALIAS--alias=ALIAS
Alias of namenode to connect to.
- -f --force
Overwrite any existing file.
- -h --help
Show a usage message and exit.
- -n NUM--num=NUM
Cap number of records to output.
- -p PARTS--parts=PARTS
Part-files to read. Specify a number to randomly select that many, or a comma-separated list of numbers to read only these. Use a number followed by a comma (e.g. 1,) to get a unique part-file. The default is to read all part-files.
- -v --verbose
Enable log output. Can be specified up to three times (increasing verbosity each time).
Examples
hdfscli-avro schema /data/impressions.avro
hdfscli-avro read -a dev snapshot.avro >snapshot.jsonl
hdfscli-avro read -F 0.1 -p 2,3 clicks.avro
hdfscli-avro write -f positives.avro <positives.jsonl -S "$(cat schema.avsc)"