csv - Man Page

process CSV files from the command line

Synopsis

  # On the command line:

  csv 1 2 -1 < report.csv

  # Reads the first two fields, as well as the last one, from "report.csv".
  # Data is cleaned up and emitted as CSV.

  csv --fields Revenue,Q1,Q2 < report.csv   # or "-f" for short

  # First line of the input (from file "report.csv") is considered as
  # header line; the fields are emitted in the order "Revenue", "Q1",
  # and "Q2". Data is cleaned up and emitted as CSV.

  csv --input report.csv --to_tsv

  # Converts the whole report to TSV (tab-separated values).

Description

CSV (comma-separated value) files are the lowest common denominator of structured data interchange formats. For such a humble file format, it is pretty difficult to get right: embedded quote marks and linebreaks, slipshod delimiters, and no One True Validity Test make CSV data found in the wild hard to parse correctly. Text::CSV_XS provides flexible and performant access to CSV files from Perl, but is cumbersome to use in one-liners and the command line.

csv is intended to make commandline processing of CSV files as easy as plain text is meant to be on Unix. Internally, it holds two Text::CSV objects (for input and for output), which have reasonable defaults but which you can reconfigure to suit your needs. Then you can extract just the fields you want, change the delimiter, clean up the data etc.

In the simplest usage, csv filters stdio and takes a list of integers. These are 1-based column numbers to select from the input CSV stream. Negative numbers are counted from the line end. Without any column list, csv selects all columns (this is still useful to normalize quoting style etc.).

Command line options

The following options are passed to Text::CSV. When preceded by the prefix "output_", the destination is affected. Otherwise these options affect both input and output.

--quote_char

--escape_char

--sep_char

--eol

--always_quote

--binary

--keep_meta_info

--allow_loose_quotes

--allow_loose_escapes

--allow_whitespace

--verbatim

NOTE: binary is set to 1 by default in csv. The other options have their Text::CSV defaults.

The following additional options are available:

--input,  -i
--output,  -o

Filenames for input and output. "-" means stdio. Useful to trigger TSV mode (--from_tsv and --to_tsv).

--columns,  -c

Column numbers may be specified using this option.

--fields,  -f

When this option is specified, the first line of the input file is considered as a header line. This option takes a comma-separated list of column-names from the first line.

For convenience, this option also accepts a comma-separated list of column numbers as well. Multiple --fields options are allowed, and both column names and numbers can be mixed together.

--from_tsv,  --from-tsv
--to_tsv,  --to-tsv

Use tabs instead of commas as the delimiter. When csv has the input or output filenames available, this is inferred when they end with .tsv. To disable this dwimmery, you may say --to_tsv=0 and --from_tsv=0.

See Also

Text::CSV, Text::CSV_XS

Author

Gaal Yahas <gaal@forum2.org>

Thanks

nothingmuch, gphat, t0m, themoniker, Prakash Kailasa, tsibley, srezic, and ether.

Bugs

Please report any bugs or feature requests to bug-app-csv at rt.cpan.org, or through the web interface at <http://rt.cpan.org/NoAuth/ReportBug.html?Queue=App-CSV>. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

You're also invited to work on a patch. The source repo is at

<git://github.com/gaal/app-csv.git>

<http://github.com/gaal/app-csv/tree/master>

Info

2024-07-18 perl v5.40.0 User Contributed Perl Documentation