Stacks datasets together by column. For example:
tut stack [files ...]
df1.tsv
| brand | model | mpg |
|---|---|---|
| tesla | 3 | 0 |
| toyota | previa | 15 |
df2.tsv
| car | mpg | color |
|---|---|---|
| ford | 20 | red |
| chevy | 15 | blue |
tut stack df1.tsv df2.tsvStacked df1.tsv df2.tsv
| brand | model | mpg | car | color |
|---|---|---|---|---|
| tesla | 3 | 0 | ||
| toyota | previa | 15 | ||
| 20 | ford | red | ||
| 15 | chevy | blue |
You can also attach the basename of the file it came from to keep track of where data is coming from.
| brand | model | mpg | car | color | basename |
|---|---|---|---|---|---|
| tesla | 3 | 0 | df1.tsv | ||
| toyota | previa | 15 | df1.tsv | ||
| 20 | ford | red | df2.tsv | ||
| 15 | chevy | blue | df2.tsv |
I frequently use this tool to combine datasets of separate samples for comparison.
Extracts a range (1-based) from every file
slice range [files ...]
Where range is the range of lines to keep.
:5- From the beginning to the 5th line5:- From the 5th line to the end3:5- From the 3rd to the 5th line
Select columns from a file by column position (1-based) or name.
# These are equivelent:
tut select 1,2 tests/data/df1.tsv
tut select mpg,cyl tests/data/df1.tsv
| mpg | cyl |
|---|---|
| 21 | 6 |
| 21 | 6 |
| 22.8 | 4 |
| ... | ... |
Add row numbers with --row-numbers
nim c --cpu:i386 --os:linux --compileOnly tut.nim