-
Notifications
You must be signed in to change notification settings - Fork 5k
Memory optimizations #31941
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory optimizations #31941
Conversation
9a46be9
to
0cf728c
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Do we have a way to lock down the current state of memory usage in such a scenario in a performance test or something similar? I am afraid introducing a wide array of local improvements might not hold up after some later refactorings, which would accidentally negate or remove these smaller improvements. Having a way to lock down the current state, we should then be able to capture the improved state. |
Our performance tests do have their max daemon heap sizes configured, so it could be possible to fine-tune that number for each perf test so that if the memory usage rises the test would OOM. Though I see a couple issues with this
A nice goal would be to say a success here is to be able to run the 6k project benchmark on CI. We already have gradle/perf-android-extra-large but when I tried to set up CI benchmarks on it last it hung on CI |
I know the Android team has some benchmark suite that captures heap dumps as certain times and diffs them, verifying changes in memory usage. Though I'm not sure we should hold up many GB of memory usage improvements over further testing improvements |
...untime/base-services/src/main/java/org/gradle/internal/graph/CachingDirectedGraphWalker.java
Outdated
Show resolved
Hide resolved
List<WbResource> result = new ArrayList<>(1); | ||
result.add(new WbResource("/", webAppDirName)); | ||
return result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 Collections.singletonList()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See below
new Facet(Facet.FacetType.installed, "jst.utility", "1.0"), | ||
new Facet(Facet.FacetType.installed, "jst.java", toJavaFacetVersion(project.getExtensions().getByType(JavaPluginExtension.class).getSourceCompatibility())) | ||
); | ||
List<Facet> result = new ArrayList<>(3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 Is something modifying the return value from an anonymous type that promises only some List
, nothing definitely modifiable? Can we use immutable list here and avoid the modification?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EclipseWtpFacet
is part of an extension on the Project. It is a user-modifyable model, and should be mutable
"module " + moduleName + " {" | ||
); | ||
String firstLine = "module " + moduleName + " {"; | ||
List<String> lines = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💅 Could pre-compute the size needed here, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't feel this was necessary. This isn't really performance sensitive at all and I didn't want to add the maintenance burden
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it were simpler (didnt have a dynamic size based on filtering) I'd say this is more worth it
File commonTools = new File(vsPath, PATH_COMMONTOOLS); | ||
File commonIde = new File(vsPath, PATH_COMMONIDE); | ||
List<File> paths = Lists.newArrayList(commonTools, commonIde); | ||
List<File> paths = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
List<File> paths = new ArrayList<>(); | |
List<File> paths = new ArrayList<>(3); |
public void complete() { | ||
dependents = ImmutableSet.copyOf(dependents); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I think this needs a short explanation. Why should it be called? What is "completed"? Why are we putting the responsibility for "finalizing" the collection as an immutable type onto clients of this class, especially if they can only get an unmodifiable set view of this collection anyways.
AllSchemesAuthentication authentication = getAuthForProxy(httpsProxy); | ||
useCredentials(credentialsProvider, Collections.singleton(authentication)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 Can all of this be moved into a new useCredsForProxy
method, removing the need for getAuthForProxy
as a separate method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
import java.util.HashSet; | ||
import java.util.Set; | ||
|
||
public class ConsumerState { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💅 New top-level class needs javadoc, maybe final
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made final.
Unfortunately I'm not sure I know all that much about this type and its usage. I just moved the code somewhere else
private void validateMutations(MutationInfo mutations) { | ||
if (!mutations.getDestroyablePaths().isEmpty()) { | ||
if (mutations.hasOutputs()) { | ||
throw new IllegalStateException("Task " + node + " has both outputs and destroyables defined. A task can define either outputs or destroyables, but not both."); | ||
} | ||
if (mutations.hasFileInputs()) { | ||
throw new IllegalStateException("Task " + node + " has both inputs and destroyables defined. A task can define either inputs or destroyables, but not both."); | ||
} | ||
if (mutations.hasLocalState()) { | ||
throw new IllegalStateException("Task " + node + " has both local state and destroyables defined. A task can define either local state or destroyables, but not both."); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 Would this be better as a MutationInfo
method: mutations.validate(node)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats definitely an option, but I like the idea of having MutationInfo
as mostly a data class
This allocates an oversized list, potentially leading to unnecessary memory usage.
We know the size of these arrays by default. Size them to avoid wasted memory
This cache gets retained in heap for the entirety of cache graph walking. For very large graphs, this cache can retain tens of GB of memory. The values of this cache are immutable, yet we store a mutable set within it -- a LinkedHashSet -- which is very memory inefficient In the Android 6k project benchmark project, when running assemble, this cache was found to contain over 124 million LinkedHashMap entries, totaling almost 8GB of retained heap _for the map nodes themselves_, not including the contents of the map. This commit stores an immutable Set instead, ensuring we use a much more space efficient data structure for this cache.
Each graph resolved during build dependency resolution is retained in memory while walking the task graph. While this is only a partial graph, without external dependencies, it can still take up a significant amount of memory. We update ResolvedComponentResult to use immutable data structures, which are signifncantly more memory efficient than mutable data structures. This reduces the amount of memory retained during task dependency graph walking, and later in the build if any build logic accesses a ResolutionResult
MutationInfo, with the exception of the nodesYetToConsumeOutput state, was generated wholly by the ResolveMutationsNode. After the ResolveMutationsNode executes, the MutationInfo is not mutated further. For very large graphs, this MutationInfo can retain a significant amount of memory -- in part due to memory inefficient mutable data structures used to store its state. This commit ensures we calculate the entirety of MutationInfo at once, so that we can store its state in a memory efficient immutable data structure. This greately reduces the retained heap from this state.
685f8aa
to
a170385
Compare
03406e3
to
0a84ddd
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
The following builds have passed: |
@bot-gradle test this |
This comment has been minimized.
This comment has been minimized.
The following builds have passed: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
Gradle is currently not able to execute
assemble
on the Android 6k project benchmark without running out of memory with a 55GB heap.This PR includes changes to help reduce the amount of retained memory during task graph traversal.
While Gradle still cannot execute
assemble
with a 55GB heap after these changes, it does reduce the required memory to executeassembleDebug
by ~3GB. Before these changes, a 26GB heap was required to complete task graph traversal. After these changes, only a 23GB heap is required.The majority of reduced memory usage was by using immutable collections instead of mutable collections. Immutable collections are much more memory efficient than mutable collections, as collections like (non-linked)/Linked Hash Map/Set allocate a
Node
on the heap for every entry in the collection, while Guava immutable collections store data in a contiguous array.This PR is separated into 5 commits:
Lists.newArrayList(E...)
. This likely does not cause much benefit, but this factory method allocates an oversized array for the new array list -- often wasting memory.ResolvedComponentResult
. Migrate to immutable data structures in this type, as it is retained in memory for the entirety of graph traversal. We could look into potentially not keeping this in memory in future changes.MutationInfo
immutable.MutationInfo
on a task graphNode
stores data related to the files that a node creates, consumes, and destroys. This was functionally immutable in the past. The code has been migrated to use immutable data structures to store this data.Future things to look at:
Configuration
. In the 6k project benchmark, there are over 1.3 millionConfiguration
instances. They are quite heavy-weight and the average "empty" configuration is ~5kb.DependencyPredecessorsOnlyNodeSet
. It stores aTreeSet
of nodes. For this large graph, the overhead of theTreeSet
itself (only the nodes of the set, not the retained data) is ~5-7GB.A snapshot of the memory situation this PR worked with:

Reviewing cheatsheet
Before merging the PR, comments starting with