Iterator-consuming parallel_map

It seems like parallel_map currently consumes the entire input before starting, which eats up a ton of memory; this is undesirable in pipelines where the input is a very large iterator. 

Expected behavior: something similar to `pool.imap` with multiprocessing.