csvgroup¶

Description¶

Perform a SQL-like group by operation on a CSV file on a specified column or columns. Operation results in printing columns that were aggregation key (it will be distinct in every row) and appending all columns from aggregation functions values:

usage: csvgroup [-h] [-d DELIMITER] [-t] [-q QUOTECHAR] [-u {0,1,2,3}] [-b]
            [-p ESCAPECHAR] [-z MAXFIELDSIZE] [-e ENCODING] [-v] [-l]
            [--zero] [-c COLUMNS] [-a FUNCTION COLUMNS] [-n]
            [FILE]

Execute a SQL-like group by on specified column or columns

positional arguments:
  FILE                  The CSV file to operate on. If omitted, will accept
                        input on STDIN.

optional arguments:
  -c COLUMNS, --columns COLUMNS
                        The column name(s) on which to group by. Should be
                        either one name (or index) or a comma-separated list.
                        May also be left unspecified, in which case none
                        columns will be used
  -a FUNCTION COLUMNS, --aggregation FUNCTION COLUMNS
                        Aggregate column values using max function
  -n, --names           Display column names and indices from the input CSV
                        and exit.

Also see: Arguments common to all utilities.

Aggregation functions¶

max

min

sum

count - count every row

countA - count non zero rows

common - returns most common value (handy with text fields aggregation)

Examples¶

Having a data:

$ csvlook examples/test_group.csv
|-----+----+----+----+----+-----|
|  h1 | h2 | h3 | h4 | h5 | h6  |
|-----+----+----+----+----+-----|
|  a  | a  | b  | 1  | 2  | 3   |
|  c  | b  | b  | 0  | 0  | 0   |
|  d  | b  | b  | 6  | 7  | 1   |
|  b  | a  | b  | 3  | 2  | 1   |
|-----+----+----+----+----+-----|

Warning

Before grouping by any number of columns (see -c option) you should always sort data using the same columns identifiers ( e.g. use csvsort -c 1,2,3 | csvgroup -c 1,2,3). csvgroup pre assumes that data are already sorted using aggregation key. This is very similar to Linux sort and uniq idiom.

Sorting is however not necessary for grouping without specifying -c parameter (without aggregation key).

Lets aggregate by column h2 using min, max and count of columns h4, h5, h6:

$ csvsort -c h2 examples/test_group.csv | csvgroup -c h2 -a min h4 -a max h5 -a count h6 | csvlook
|-----+---------+---------+------------|
|  h2 | min(h4) | max(h5) | count(h6)  |
|-----+---------+---------+------------|
|  a  | 1       | 2       | 2          |
|  b  | 0       | 7       | 2          |
|-----+---------+---------+------------|

We can also define many columns for one aggregate:

$ csvsort -c h2 examples/test_group.csv | csvgroup -c h2 -a max 4-6 | csvlook
|-----+---------+---------+----------|
|  h2 | max(h4) | max(h5) | max(h6)  |
|-----+---------+---------+----------|
|  a  | 3       | 2       | 3        |
|  b  | 6       | 7       | 1        |
|-----+---------+---------+----------|

Aggregating by two columns:

$ csvsort -c h2,h3 examples/test_group.csv | csvgroup -c h2,h3 -a max 4-6 | csvlook
|-----+----+---------+---------+----------|
|  h2 | h3 | max(h4) | max(h5) | max(h6)  |
|-----+----+---------+---------+----------|
|  a  | b  | 3       | 2       | 3        |
|  b  | b  | 6       | 7       | 1        |
|-----+----+---------+---------+----------|

And by all rows:

$ csvgroup -a max 4-6 examples/test_group.csv | csvlook
|----------+---------+----------|
|  max(h4) | max(h5) | max(h6)  |
|----------+---------+----------|
|  6       | 7       | 3        |
|----------+---------+----------|

Note

Notice that aggregation by all rows does not require any kind of sorting, because every row is treated as unique.

Get most common value of every column:

$ csvgroup -a common 1-6 examples/test_group.csv | csvlook
|-----+----+----+----+----+-----|
|  h1 | h2 | h3 | h4 | h5 | h6  |
|-----+----+----+----+----+-----|
|  a  | a  | b  | 1  | 2  | 1   |
|-----+----+----+----+----+-----|

csvgroup¶

Description¶

Aggregation functions¶

Examples¶

Project Versions

Table Of Contents

Previous topic

Next topic

This Page

Navigation

csvgroup¶

Description¶

Aggregation functions¶

Examples¶

Project Versions

RTD Search

Table Of Contents

Previous topic

Next topic

This Page

Quick search

Navigation