- 
                Notifications
    You must be signed in to change notification settings 
- Fork 2
Cleanup / optimization as well as parallelizing the work #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…f skipped instead of unwrapping
…inserting to HashMap
…y_words through a channel
| Wow, thanks a ton. There’s a whole lot of stuff in here that I really have never used and some I’ve never seen either. I’ll work my way through your commits tomorrow for a closer look. Again, thanks! | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a look through, and it seems like a very reasonable way of splitting the work.
I have added some comments where the code could be more concise.
| .map(|count| *count += occurrences) | ||
| .unwrap_or_else(|| { | ||
| self.curses.insert(curse.to_owned(), occurrences); | ||
| }) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be replaced with:
*self.curses.entry(curse.into()).or_insert(0) += occurrences
This does require a cast of the curse argument, since you need to pass a K instead of Q: Borrow<K>, but I think it makes it clearer what is going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above comment ^, this is a workaround so that we don't create a copy of the string unless necessary.
| directory | ||
| } else { | ||
| env::current_dir()? | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can also be replaced with the below:
let repo_path = match Cli::from_args().directory {
        Some(d) => d,
        None => env::current_dir()?
};
This is more concise, but the downside is that it makes it less obvious what the return of from_args() is and why it would have a directory member.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's preference here. I would say that it feels weird to use Cli::from_args().directory as usually there are other parameters / flags, but there aren't any currently so it would be fine as well :P.
| } | ||
|  | ||
| self.authors.get_mut(author_name).expect("exists") | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be shortened to the below:
self.authors.entry(author_name.into()).or_insert(Author::new(author_name))
This makes it a more idiomatic method for a "get or create" pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered the approach, but I really wanted to not allocated a new String unless it was strictly necessary (it's only needed when inserting into the HashMap). This is a long standing problem, I would love to know if there's an easier way to do this with stdlib or if there's a crate that solves this (see: https://users.rust-lang.org/t/efficient-string-hashmaps-for-a-frequency-count/7752/8).
Additionally, you would probably want to use or_insert_with instead of or_insert as the Author::new(..) is evaluated eagerly for each call with or_insert (and Author::new copies a string).
Something like: self.authors.entry(author_name.into()).or_insert_with(|| Author::new(author_name)).
| .get_mut(word) | ||
| .map(|count| *count += 1) | ||
| .unwrap_or_else(|| { | ||
| naughty_words.insert(word.to_owned(), 1); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change from L111 could also be applied here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See previous comment ^, avoiding string copies unless necessary.
I would like to find a crate that extends the HashMap<String, T> API to clean up the duplication though..
| Thanks for the review @pantsman0 😄 | 
| Re @sondr3: No problem! Feel free to leave any comments on the code if you have any questions 👍. | 
Add authors to the repo struct, I was silly and thought it was reasonable to put them in a HashSet but then I couldn't retrieve them because I had nothing to match them against. Using a HashMap where the key is what I should've seen... but I didn't. Add function that lazily adds authors to the repo, a better structure of the main loop and a much better way of adding and retrieving authors.
| I've been working my way through your changes commit for commit and they are really cool! There's a lot of neat little things in here that I would've never thought of, I especially like the  | 
Hi, cool project!
As per the /r/rust post, I'm submitting a PR for some potentially better ways to do things. Looking through the commits might be better than the changes as whole.
I think there's some room for cleaning up some duplication (esp. around the HashMap API) and possibly some unnecessary allocations, but the flow looks a lot better to me overall and the work is now parallelized through jobs and returned to the main thread through a channel.