Skip to content

Conversation

@DarrenTsung
Copy link

Hi, cool project!

As per the /r/rust post, I'm submitting a PR for some potentially better ways to do things. Looking through the commits might be better than the changes as whole.

I think there's some room for cleaning up some duplication (esp. around the HashMap API) and possibly some unnecessary allocations, but the flow looks a lot better to me overall and the work is now parallelized through jobs and returned to the main thread through a channel.

@sondr3
Copy link
Owner

sondr3 commented Oct 15, 2018

Wow, thanks a ton. There’s a whole lot of stuff in here that I really have never used and some I’ve never seen either. I’ll work my way through your commits tomorrow for a closer look. Again, thanks!

Copy link

@pantsman0 pantsman0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look through, and it seems like a very reasonable way of splitting the work.

I have added some comments where the code could be more concise.

.map(|count| *count += occurrences)
.unwrap_or_else(|| {
self.curses.insert(curse.to_owned(), occurrences);
})
Copy link

@pantsman0 pantsman0 Oct 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be replaced with:
*self.curses.entry(curse.into()).or_insert(0) += occurrences

This does require a cast of the curse argument, since you need to pass a K instead of Q: Borrow<K>, but I think it makes it clearer what is going on.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above comment ^, this is a workaround so that we don't create a copy of the string unless necessary.

directory
} else {
env::current_dir()?
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can also be replaced with the below:

let repo_path = match Cli::from_args().directory {
        Some(d) => d,
        None => env::current_dir()?
};

This is more concise, but the downside is that it makes it less obvious what the return of from_args() is and why it would have a directory member.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's preference here. I would say that it feels weird to use Cli::from_args().directory as usually there are other parameters / flags, but there aren't any currently so it would be fine as well :P.

}

self.authors.get_mut(author_name).expect("exists")
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be shortened to the below:
self.authors.entry(author_name.into()).or_insert(Author::new(author_name))

This makes it a more idiomatic method for a "get or create" pattern.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered the approach, but I really wanted to not allocated a new String unless it was strictly necessary (it's only needed when inserting into the HashMap). This is a long standing problem, I would love to know if there's an easier way to do this with stdlib or if there's a crate that solves this (see: https://users.rust-lang.org/t/efficient-string-hashmaps-for-a-frequency-count/7752/8).

Additionally, you would probably want to use or_insert_with instead of or_insert as the Author::new(..) is evaluated eagerly for each call with or_insert (and Author::new copies a string).

Something like: self.authors.entry(author_name.into()).or_insert_with(|| Author::new(author_name)).

.get_mut(word)
.map(|count| *count += 1)
.unwrap_or_else(|| {
naughty_words.insert(word.to_owned(), 1);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change from L111 could also be applied here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comment ^, avoiding string copies unless necessary.

I would like to find a crate that extends the HashMap<String, T> API to clean up the duplication though..

@sondr3
Copy link
Owner

sondr3 commented Oct 16, 2018

Thanks for the review @pantsman0 😄

@DarrenTsung
Copy link
Author

Re @sondr3: No problem! Feel free to leave any comments on the code if you have any questions 👍.

sondr3 added a commit that referenced this pull request Oct 16, 2018
sondr3 added a commit that referenced this pull request Oct 18, 2018
Add authors to the repo struct, I was silly and thought it was
reasonable to put them in a HashSet but then I couldn't retrieve them
because I had nothing to match them against. Using a HashMap where the
key is what I should've seen... but I didn't.

Add function that lazily adds authors to the repo, a better structure
of the main loop and a much better way of adding and retrieving authors.
sondr3 added a commit that referenced this pull request Oct 21, 2018
sondr3 added a commit that referenced this pull request Oct 21, 2018
@sondr3
Copy link
Owner

sondr3 commented Oct 21, 2018

I've been working my way through your changes commit for commit and they are really cool! There's a lot of neat little things in here that I would've never thought of, I especially like the if let [...] for the commit author and message, it's really neat. I'd love to incorporate the parallelization changes you made, because it's really fast, but I have a hard time wrapping my head around it due to inexperience with rayon and crossbeam, so I'll have to came for it at a later time when I understand it better. Thanks a lot though! 😄

@DarrenTsung DarrenTsung deleted the cleanup branch June 13, 2019 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants