-
Notifications
You must be signed in to change notification settings - Fork 5k
Deduplicate strings stored in the configuration cache #29950
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@bot-gradle test ACC without PTS |
I've triggered the following builds with parameters: |
52c9ece
to
2f372c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome results! I've left a few questions before approving.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ This is also used for Dependency Engine cache data. It looks like this isn't cross-version, but can you confirm, just in case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it seems to be only used for temporary caches here but I'm confirming with bt-dependency-management just in case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed.
*/ | ||
Integer newIndex = strings.size() + 2; | ||
strings.put(key, newIndex); | ||
writeStringIndex(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Can we have a constants for 0 and 1 instead of these magic numbers? E.g. NULL_STRING_INDEX = 0
and NEW_STRING = 1
.
string = strings[idx]; | ||
int index = readStringIndex(); | ||
switch (index) { | ||
case 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 One potential trick: ATM we're wasting two first elements of the array. We can avoid this and replace switch with a single if, though I'm not sure if it affects performance much:
- Let's use 0 for new string and 1 for null
- Have
null
in the initial array:INITIAL_CAPACITY_MARKER = new String [] { null }
. - Now we can change the switch into
if (index == 0) return readNewString();
return strings[index - 1];
But I guess this goes to microbenchmarking territory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting idea! I went with a variation that avoids the subtraction on every read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a second thought, using the two constants on both encoder and decoder makes the connection clearer and it doesn't seem to be a clear performance advantage either way.
* stream. | ||
*/ | ||
public class StringDeduplicatingKryoBackedDecoder extends AbstractDecoder implements Decoder, Closeable { | ||
public static final int INITIAL_CAPACITY = 32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❔ Looks like it can be private too.
* @see com.esotericsoftware.kryo.io.Output#writeString(java.lang.String) | ||
*/ | ||
@SuppressWarnings('GrDeprecatedAPIUsage') | ||
private static byte[] encodedBytesOf(String string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 TBH, I think this test now knows too much of the encoding details and has become fragile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. It can still be useful and less fragile by having the asserts ignoring the last character but I'd deal with it later once we introduce compression.
} | ||
def service = gradle.sharedServices.registerIfAbsent('stringChecker', StringDedupCheckerService) {} | ||
allprojects { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ Is this test IP-incompatible? It can be a settings script that uses the new lifecycle API.
❓ With parallel project saving, will string deduplication work across project? Or is the goal to have it across projects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this test IP-incompatible? It can be a settings script that uses the new lifecycle API.
Right, good point.
With parallel project saving, will string deduplication work across project? Or is the goal to have it across projects?
Yes, that would be the idea. We discussed having all the strings handled by a separate String Pool worker, let's see if we can make that fly.
if (string === null) { | ||
string = s | ||
} else { | ||
assert string === s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 nice
@bot-gradle test RFN without PTS |
I've triggered the following builds with parameters: |
62d1ba8
to
2000402
Compare
Thanks for the review, @mlopatkin, please take another look. |
I only did a size check of the config cache on the 6200+ project I am working on. Results are as follows: |
For ~70% smaller configuration cache entries, faster loading times and potentially reduced memory usage on cache hits.
- write string indexes as small ints for reduced file sizes (~10% smaller) - avoid `null` check on every `readNullableString()` operation
So `getWritePosition` can be polymorphically exposed from `KryoBackedEncoder` and `StringDeduplicatingKryoBackedEncoder`. ./gradlew :architecture-test:test -ParchunitRefreeze
And use it to restore `ConfigurationCacheEncryptionIntegrationTest` to its former behavior.
c37d777
to
7688645
Compare
WARN: Based on labels, this pull request addresses notable issue but no changes to release note found. |
The merge queue build has failed. Click here to see all failures. |
I did a configuration phase benchmark in our build to test the performance against 8.6:
Then a CC load benchmark (same scenario but with CC size for this 3700 project build was 718MB in 8.6, and in 8.10-rc-1 it is 212MB! |
Thanks for sharing the data, @joshfriend! That's great to see 👍 |
With this change, the configuration cache entry for
assembleDebug
onperf-android-large-2
is 74% smaller (356M against 1.3G) and it runs ~30% faster on a cache hit:Gradle performance tests also show improvement.
Reviewing cheatsheet
Before merging the PR, comments starting with