nft-csv

nf-test plugin to provide support for CSV files based on tablesaw.

Requirements

nf-test version 0.7.0 or higher

Setup

To use this plugin you need to activate the nft-csv plugin in your nf-test.config file:

config {
  plugins {
    load "[email protected]"
  }
}

Usage

nft-csv extends path by a csv property that can be used to parse csv files. The csv property could be configured with different options.

Example:

def csvFile = path("file.csv").csv

def tabFile = path("file.tab").csv(sep: "\t")

def CsvFile = path("file.csv").csv(sep: "\t", quote: '"')

def gzCsvFile = path("file.csv.gz").csv(decompress: true)

def csvFile = csv("https://raw.githubusercontent.com/jtablesaw/tablesaw/master/data/bush.csv")

Available options:

header: When true, the first line is used as the columns names. Otherwise, the columns are named "C0", "C1", ... (default: true).
sep: The character used to separate values (default: ,)
quote: The character used to quote values (default: "").
decompress: When true, decompress the content using the GZIP format before processing it (default: false). Files with the .gz extension are NOT YET decompressed automatically.

The result is an object of class TableWrapper that contains the following properties anf methods:

`rowCount`

Returns the number of rows in the table.

Examples:

assert path("file.csv").csv.rowCount == 4

//or
with( path("file.tab").csv(sep: "\t")) {
    assert rowCount == 4
}

`columnCount`

Returns the number of columns in the table.

Examples:

assert path("file.csv").csv.columnCount == 3

//or
with( path("file.csv").csv) {
    assert columnCount == 3
}

`columnNames`

Returns the names of the columns in the table.

Examples:

def csvFile = path("file.csv").csv
assert pcsvFile.columnNames == ["col_a", "col_b", "col_c"]
assert "col_b" in csvFile.columnNames
assert "lukas" !in csvFile.csv.columnNames

//or
with( path("file.csv").csv) {
    assert columnNames == ["col_a", "col_b", "col_c"]
    assert "col_b" in columnNames
    assert "lukas" !in columnNames
}

`rows`

Returns the rows of the table as a list of maps. Each map represents a row with column names as keys.

Examples:

with (path("file.csv").csv) {
    assert rows[1] == ["col_a": 4, "col_b": 5, "col_c": 6]
    assert rows[1] == ["col_c": 6, "col_a": 4, "col_b": 5]
}

`columns`

Returns the columns of the table as a map of lists. Each entry in the map represents a column with the column name as the key.

Examples:

assert columns["col_b"] == [2, 5 ,8, 11]
assert columns["col_b"] != [5, 2 ,8, 11]

`sort()`

Sorts the columns and rows of the table based on their natural order. This makes it possible to compare tables with a different order of columns and rows with expected table data. Examples:

def sortedTable1 = path("file1.csv").csv.sort()

`sortRows()`

Sorts the rows of the table based on all columns in their natural order.

Examples:

def sortedTable1 = path("file1.csv").csv.sortRows()

`sortRows(String columnName, boolean ascending)`

Sorts the rows of the table based on the values in the specified column.

columnName: The column to sort by.
ascending: true for ascending order, false for descending. (optional)

Examples:

def sortedTable1 = path("file1.csv").csv.sortRows("col_b").sortRows("col_a", false)

`sortColumns()`

Sorts the columns of the table alphabetically by their names in ascending order.

def csvFile = path("file.csv").csv
assert csvFile.columnNames == ["col_c", "col_a", "col_b"]

def csvFile2 = path("file.csv").csv.sortColumns()
assert csvFile2.columnNames == ["col_a", "col_b", "col_c"]

`sortColumns(boolean ascending)`

Sorts the columns of the table alphabetically by their names.

ascending: true for ascending order, false for descending.

Examples:

def csvFile = path("file.csv").csv
assert csvFile.columnNames == ["col_c", "col_a", "col_b"]

def csvFile2 = path("file.csv").csv.sortColumns(false)
assert csvFile2.columnNames == ["col_c", "col_b", "col_a"]

with(path("file.csv").csv.sortColumns(false)) {
    assert columnNames == ["col_c", "col_b", "col_a"]
}

`view()`

Prints the strucuture and number of rows and columns.

path("file.csv").csv.view()

`table`

Returns the tablesaw table instance and allows you to use all the methods to shape, merge and filter your data.

Examples:

//print table structure
print path(filename).csv.table.structure()

Most tablesaw operations create a new table object. You could use the csv function to create a TableWrapper object to use its methods.

Examples:

 def filename = process.out.csv.get(0)
with(path(filename).csv){
    assert columnNames == ["col_a", "col_b", "col_c"]
    assert rowCount == 4
    assert columnCount == 3
}
def table = csv(path(filename).csv.table.select("col_c", "col_a"))

with(table){
    assert columnNames == ["col_c", "col_a"]
    assert rowCount == 4
    assert columnCount == 2
}

Assertions

This plugin provides two assertions to simplify the comparison of tables containing numbers, using a specified precision.

`assertTableEquals(array1, array2, double precision)`

This could be used to compare tables where columns with numbers use the default precision of 0.00001.

then {
    def expected = csv("tests/data/input/chr20-unphased/scores.expected.txt")
    def actual = csv("${outputDir}/scores.txt")
    assertTableEquals actual, expected
}

A user defined precision can be provided:

then {
    assertTableEquals actual, expected, 0.000000001
}

The order of rows and columns has to be same between the tables. However, you could combine it with sortColumns(), sortRows() or sort() to normalize the files:

then {
    def expected = csv("tests/data/input/chr20-unphased/scores.expected.txt").sort()
    def actual = csv("${outputDir}/scores.txt").sort()
    assertTableEquals actual, expected, 0.00001
}

`assertArrayEquals(array1, array2, double precision)`

This could be used to compare arrays containing numbers with precision.

Examples:

then {
    def expected = csv("tests/data/input/chr20-unphased/scores.expected.txt")

    def actual = csv("${outputDir}/scores.txt")
    with(actual) {
        assert columnNames == ["sample", "PGS000027"]
        assert columns["sample"] == expected.columns["sample"]
        assertArrayEquals columns["PGS000027"], expected.columns["PGS000027"]
        // or with user defined precision
        assertArrayEquals columns["PGS000027"], expected.columns["PGS000027"], 0.0000001
    }
}

Contact

Lukas Forer (@lukfor)

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
src/main		src/main
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dependency-reduced-pom.xml		dependency-reduced-pom.xml
pom.xml		pom.xml
tests.sh		tests.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

nft-csv

Requirements

Setup

Usage

`rowCount`

`columnCount`

`columnNames`

`rows`

`columns`

`sort()`

`sortRows()`

`sortRows(String columnName, boolean ascending)`

`sortColumns()`

`sortColumns(boolean ascending)`

`view()`

`table`

Assertions

`assertTableEquals(array1, array2, double precision)`

`assertArrayEquals(array1, array2, double precision)`

Contact

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

lukfor/nft-csv

Folders and files

Latest commit

History

Repository files navigation

nft-csv

Requirements

Setup

Usage

rowCount

columnCount

columnNames

rows

columns

sort()

sortRows()

sortRows(String columnName, boolean ascending)

sortColumns()

sortColumns(boolean ascending)

view()

table

Assertions

assertTableEquals(array1, array2, double precision)

assertArrayEquals(array1, array2, double precision)

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

`rowCount`

`columnCount`

`columnNames`

`rows`

`columns`

`sort()`

`sortRows()`

`sortRows(String columnName, boolean ascending)`

`sortColumns()`

`sortColumns(boolean ascending)`

`view()`

`table`

`assertTableEquals(array1, array2, double precision)`

`assertArrayEquals(array1, array2, double precision)`

Packages