nf-test plugin to provide support for CSV files based on tablesaw.
- nf-test version 0.7.0 or higher
To use this plugin you need to activate the nft-csv plugin in your nf-test.config file:
config {
plugins {
load "[email protected]"
}
}
nft-csv extends path by a csv property that can be used to parse csv files. The csv property could be configured with different options.
Example:
def csvFile = path("file.csv").csv
def tabFile = path("file.tab").csv(sep: "\t")
def CsvFile = path("file.csv").csv(sep: "\t", quote: '"')
def gzCsvFile = path("file.csv.gz").csv(decompress: true)
def csvFile = csv("https://raw.githubusercontent.com/jtablesaw/tablesaw/master/data/bush.csv")
Available options:
header: When true, the first line is used as the columns names. Otherwise, the columns are named "C0", "C1", ... (default:true).sep: The character used to separate values (default:,)quote: The character used to quote values (default:"").decompress: When true, decompress the content using the GZIP format before processing it (default:false). Files with the.gzextension are NOT YET decompressed automatically.
The result is an object of class TableWrapper that contains the following properties anf methods:
Returns the number of rows in the table.
Examples:
assert path("file.csv").csv.rowCount == 4
//or
with( path("file.tab").csv(sep: "\t")) {
assert rowCount == 4
}Returns the number of columns in the table.
Examples:
assert path("file.csv").csv.columnCount == 3
//or
with( path("file.csv").csv) {
assert columnCount == 3
}Returns the names of the columns in the table.
Examples:
def csvFile = path("file.csv").csv
assert pcsvFile.columnNames == ["col_a", "col_b", "col_c"]
assert "col_b" in csvFile.columnNames
assert "lukas" !in csvFile.csv.columnNames
//or
with( path("file.csv").csv) {
assert columnNames == ["col_a", "col_b", "col_c"]
assert "col_b" in columnNames
assert "lukas" !in columnNames
}Returns the rows of the table as a list of maps. Each map represents a row with column names as keys.
Examples:
with (path("file.csv").csv) {
assert rows[1] == ["col_a": 4, "col_b": 5, "col_c": 6]
assert rows[1] == ["col_c": 6, "col_a": 4, "col_b": 5]
}Returns the columns of the table as a map of lists. Each entry in the map represents a column with the column name as the key.
Examples:
assert columns["col_b"] == [2, 5 ,8, 11]
assert columns["col_b"] != [5, 2 ,8, 11]Sorts the columns and rows of the table based on their natural order. This makes it possible to compare tables with a different order of columns and rows with expected table data. Examples:
def sortedTable1 = path("file1.csv").csv.sort()Sorts the rows of the table based on all columns in their natural order.
Examples:
def sortedTable1 = path("file1.csv").csv.sortRows()Sorts the rows of the table based on the values in the specified column.
columnName: The column to sort by.ascending:truefor ascending order,falsefor descending. (optional)
Examples:
def sortedTable1 = path("file1.csv").csv.sortRows("col_b").sortRows("col_a", false)Sorts the columns of the table alphabetically by their names in ascending order.
def csvFile = path("file.csv").csv
assert csvFile.columnNames == ["col_c", "col_a", "col_b"]
def csvFile2 = path("file.csv").csv.sortColumns()
assert csvFile2.columnNames == ["col_a", "col_b", "col_c"]Sorts the columns of the table alphabetically by their names.
ascending:truefor ascending order,falsefor descending.
Examples:
def csvFile = path("file.csv").csv
assert csvFile.columnNames == ["col_c", "col_a", "col_b"]
def csvFile2 = path("file.csv").csv.sortColumns(false)
assert csvFile2.columnNames == ["col_c", "col_b", "col_a"]
with(path("file.csv").csv.sortColumns(false)) {
assert columnNames == ["col_c", "col_b", "col_a"]
}Prints the strucuture and number of rows and columns.
path("file.csv").csv.view()Returns the tablesaw table instance and allows you to use all the methods to shape, merge and filter your data.
Examples:
//print table structure
print path(filename).csv.table.structure()Most tablesaw operations create a new table object. You could use the csv function to create a TableWrapper object to use its methods.
Examples:
def filename = process.out.csv.get(0)
with(path(filename).csv){
assert columnNames == ["col_a", "col_b", "col_c"]
assert rowCount == 4
assert columnCount == 3
}
def table = csv(path(filename).csv.table.select("col_c", "col_a"))
with(table){
assert columnNames == ["col_c", "col_a"]
assert rowCount == 4
assert columnCount == 2
}This plugin provides two assertions to simplify the comparison of tables containing numbers, using a specified precision.
This could be used to compare tables where columns with numbers use the default precision of 0.00001.
then {
def expected = csv("tests/data/input/chr20-unphased/scores.expected.txt")
def actual = csv("${outputDir}/scores.txt")
assertTableEquals actual, expected
}A user defined precision can be provided:
then {
assertTableEquals actual, expected, 0.000000001
}The order of rows and columns has to be same between the tables. However, you could combine it with sortColumns(), sortRows() or sort() to normalize the files:
then {
def expected = csv("tests/data/input/chr20-unphased/scores.expected.txt").sort()
def actual = csv("${outputDir}/scores.txt").sort()
assertTableEquals actual, expected, 0.00001
}This could be used to compare arrays containing numbers with precision.
Examples:
then {
def expected = csv("tests/data/input/chr20-unphased/scores.expected.txt")
def actual = csv("${outputDir}/scores.txt")
with(actual) {
assert columnNames == ["sample", "PGS000027"]
assert columns["sample"] == expected.columns["sample"]
assertArrayEquals columns["PGS000027"], expected.columns["PGS000027"]
// or with user defined precision
assertArrayEquals columns["PGS000027"], expected.columns["PGS000027"], 0.0000001
}
}Lukas Forer (@lukfor)