Skip to content

Conversation

@mattjeffery
Copy link

Currently the HadoopStoreBuilder only has partial support for file path globbing.
If you use a glob in the --input argument eg. --input data/store_a/part-r-* the store will build but the data size will be counted as 0 thus the number of chunk (num.chunks) will be set to 1. If the store that you are generating is large then the generation will fail with a VoldemortException:

voldemort.VoldemortException: Chunk overflow exception: chunk 0 has exceeded 2147483647 bytes.

This patch adds file path globbing support to the HadoopStoreBuilder.sizeOfPath method so that the input size can be correctly determined and the correct number of chunks calculated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant