It seems like parallel_map currently consumes the entire input before starting, which eats up a ton of memory; this is undesirable in pipelines where the input is a very large iterator.
Expected behavior: something similar to pool.imap with multiprocessing.