========
csvgroup
========

Description
===========

Perform a SQL-like group by operation on a CSV file on a specified column or columns. Operation results in printing
columns that were aggregation key (it will be distinct in every row) and appending all columns from aggregation functions values::


    usage: csvgroup [-h] [-d DELIMITER] [-t] [-q QUOTECHAR] [-u {0,1,2,3}] [-b]
                [-p ESCAPECHAR] [-z MAXFIELDSIZE] [-e ENCODING] [-v] [-l]
                [--zero] [-c COLUMNS] [-a FUNCTION COLUMNS] [-n]
                [FILE]

    Execute a SQL-like group by on specified column or columns

    positional arguments:
      FILE                  The CSV file to operate on. If omitted, will accept
                            input on STDIN.

    optional arguments:
      -c COLUMNS, --columns COLUMNS
                            The column name(s) on which to group by. Should be
                            either one name (or index) or a comma-separated list.
                            May also be left unspecified, in which case none
                            columns will be used
      -a FUNCTION COLUMNS, --aggregation FUNCTION COLUMNS
                            Aggregate column values using max function
      -n, --names           Display column names and indices from the input CSV
                            and exit.

Also see: :doc:`common_arguments`.

Aggregation functions
=====================

 * max
 * min
 * sum
 * count - count every row
 * countA - count non zero rows
 * common - returns most common value (handy with text fields aggregation)


Examples
========

Having a data::

    $ csvlook examples/test_group.csv
    |-----+----+----+----+----+-----|
    |  h1 | h2 | h3 | h4 | h5 | h6  |
    |-----+----+----+----+----+-----|
    |  a  | a  | b  | 1  | 2  | 3   |
    |  c  | b  | b  | 0  | 0  | 0   |
    |  d  | b  | b  | 6  | 7  | 1   |
    |  b  | a  | b  | 3  | 2  | 1   |
    |-----+----+----+----+----+-----|

.. warning::

    Before grouping by any number of columns (see ``-c`` option) you should always sort data using the same columns identifiers ( e.g. use ``csvsort -c 1,2,3 | csvgroup -c 1,2,3``). csvgroup pre assumes that data are already sorted using aggregation key. This is very similar to Linux ``sort`` and ``uniq`` idiom.

    Sorting is however not necessary for grouping without specifying `-c` parameter (without aggregation key).


Lets aggregate by column h2 using  min, max and count of columns h4, h5, h6::

    $ csvsort -c h2 examples/test_group.csv | csvgroup -c h2 -a min h4 -a max h5 -a count h6 | csvlook
    |-----+---------+---------+------------|
    |  h2 | min(h4) | max(h5) | count(h6)  |
    |-----+---------+---------+------------|
    |  a  | 1       | 2       | 2          |
    |  b  | 0       | 7       | 2          |
    |-----+---------+---------+------------|

We can also define many columns for one aggregate::

    $ csvsort -c h2 examples/test_group.csv | csvgroup -c h2 -a max 4-6 | csvlook
    |-----+---------+---------+----------|
    |  h2 | max(h4) | max(h5) | max(h6)  |
    |-----+---------+---------+----------|
    |  a  | 3       | 2       | 3        |
    |  b  | 6       | 7       | 1        |
    |-----+---------+---------+----------|


Aggregating by two columns::

    $ csvsort -c h2,h3 examples/test_group.csv | csvgroup -c h2,h3 -a max 4-6 | csvlook
    |-----+----+---------+---------+----------|
    |  h2 | h3 | max(h4) | max(h5) | max(h6)  |
    |-----+----+---------+---------+----------|
    |  a  | b  | 3       | 2       | 3        |
    |  b  | b  | 6       | 7       | 1        |
    |-----+----+---------+---------+----------|

And by all rows::

    $ csvgroup -a max 4-6 examples/test_group.csv | csvlook
    |----------+---------+----------|
    |  max(h4) | max(h5) | max(h6)  |
    |----------+---------+----------|
    |  6       | 7       | 3        |
    |----------+---------+----------|

.. note::

    Notice that aggregation by all rows does not require any kind of sorting, because every row is treated as unique.

Get most common value of every column::

    $ csvgroup -a common 1-6 examples/test_group.csv | csvlook
    |-----+----+----+----+----+-----|
    |  h1 | h2 | h3 | h4 | h5 | h6  |
    |-----+----+----+----+----+-----|
    |  a  | a  | b  | 1  | 2  | 1   |
    |-----+----+----+----+----+-----|