-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HADOOP-19737: ABFS: Add metrics to identify improvements with read and write aggressiveness #8056
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
| configuration1.set(FS_AZURE_ENABLE_PREFETCH_REQUEST_PRIORITY, "true"); | ||
|
|
||
| Configuration configuration2 = new Configuration(getRawConfiguration()); | ||
| //use the default value for the config: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove the comment as well. The default traffic priority value is true now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taken
| if (lastTime == ZERO) { | ||
| lastCpuTime = cpuTime; | ||
| lastTime = now; | ||
| return 0.0; // first call has no previous data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can use ZERO_D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed this piece of code
| @Test | ||
| public void verifyWriteRequestOfBufferSizeAndClose() throws Exception { | ||
|
|
||
| AbfsCounters abfsCounters = Mockito.spy(new AbfsCountersImpl(new URI("abcd"))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can have a constant for URI "abcd"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if creating mock client handlers, client, abfsrestop, spying abfscounters is repeated- then we could have it as a helper method maybe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if creating mock client handlers, client, abfsrestop, spying abfscounters is repeated- then we could have it as a helper method maybe -> was existing code will take it up later
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
anujmodi2021
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial Review. Will complete in some time
| @@ -0,0 +1,95 @@ | |||
| /** | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need 2 searate classes for these?
All the metrics seems to be common in both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also these are not only threads but memory related metrics as well.
We can rename to something like: ABFSResourceUtilizationMetrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can create a single class but in future, there may be some metrics not relevant to both of them, hence for separation of concerns felt it would be better to have 2 separate classes
| abfsUriQueryBuilder, cachedSasToken); | ||
|
|
||
| // Retrieve the read thread pool metrics from the ABFS counters. | ||
| AbfsReadThreadPoolMetrics metrics = getAbfsCounters() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Common code in Blob and Dfs client Can be a common method in base class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taken
|
|
||
| /** | ||
| * Private constructor to prevent instantiation as this needs to be singleton. | ||
| * Initializes a new instance of {@code ReadBufferManagerV2} for the given ABFS client. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the clients in JVM ae supposed to use same ReadBufferManager. This comment seems wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taken
| */ | ||
| private ReadBufferManagerV2() { | ||
| private ReadBufferManagerV2(AbfsClient abfsClient) { | ||
| this.abfsClient = abfsClient; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need client here?
IMO we should not pass client to RBM.
Any client related work should happen in AbfsInputStream how its happening today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passed the abfscounters directly instead of client
| <exclude>**/azurebfs/ITestSmallWriteOptimization.java</exclude> | ||
| <exclude>**/azurebfs/ITestAbfsStreamStatistics*.java</exclude> | ||
| <exclude>**/azurebfs/services/ITestReadBufferManager.java</exclude> | ||
| <exclude>**/azurebfs/WriteThreadPoolSizeManager.java</exclude> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
| * is initialized and not already monitoring. | ||
| */ | ||
| private void initializeMonitoringIfNeeded() { | ||
| if (poolSizeManager != null && !poolSizeManager.isMonitoringStarted()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to repeat this check inside synchronized block as well?
What if 2 threads check together and both try to startCPUMonitoring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taken
| @@ -0,0 +1,165 @@ | |||
| /** | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
A lot ff redundant code can be combined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taken
| AppendRequestParameters reqParams, | ||
| TracingContext tracingContext) throws IOException { | ||
| TracingContext tracingContextAppend = new TracingContext(tracingContext); | ||
| // Fetches write thread pool metrics from the ABFS client and adds them to the tracing context. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Common code Can be moved to base IngressHandler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taken
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
| @@ -0,0 +1,95 @@ | |||
| /** | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these metrics might be a overhead for some customer who does not wish to use new design for read and write and they might not wish to get these metrics.
If not already done, can we put this change behind a config so that these metrics comes only when enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
already taken care of in AbfsClient class
| * Sets the metric results string used for tracing or logging. | ||
| * @param metricResults the formatted metric data to store. | ||
| */ | ||
| public void setMetricResults(final String metricResults) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are only for Thread Pool metrics right?
Let's rename accordingly.
| // If metricHeader is present, append it to the client request ID header for tracing | ||
| if (!metricHeader.equals(EMPTY_STRING)) { | ||
| httpOperation.setRequestProperty(HttpHeaderConfigurations.X_MS_FECLIENT_METRICS, metricHeader); | ||
| httpOperation.setRequestProperty(HttpHeaderConfigurations.X_MS_CLIENT_REQUEST_ID, header + COLON + metricHeader); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would cause disparity in schema. With v2 we have made every field mandatory. If metrics are not there we should simply keep empty string but colon should always be there.
| abfsUriQueryBuilder, cachedSasToken); | ||
|
|
||
| // Retrieve the read thread pool metrics from the ABFS counters. | ||
| AbfsReadThreadPoolMetrics metrics = getAbfsCounters() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does getAbfsReadThreadPoolMetrics() returns aggregated metrics?
What defines the period over which these metrics are aggregated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, these metrics are pushed only when thread pool size changes and whenever toString() method is called whatever we have aggregated till then is pushed and then they are reinitialized. There is no concept of a period for these metrics.
|
|
||
| AbfsReadFooterMetrics getAbfsReadFooterMetrics(); | ||
|
|
||
| void initializeReadMetrics(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: rename to accomodate the fact that these are resource management related.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taken
| * @return the highest JVM CPU utilization percentage recorded | ||
| */ | ||
| @VisibleForTesting | ||
| public double getMaxCpuUtilization() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: maxJvmCPUUtilization
| V1("v1", 13); | ||
| V1("v1", 13), | ||
| /** | ||
| * Version 1 of the tracing header, which includes a version prefix and has 13 permanent fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Edit javadoc to reflect v2
| * Schema: version:clientCorrelationId:clientRequestId:fileSystemId | ||
| * :primaryRequestId:streamId:opType:retryHeader:ingressHandler | ||
| * :position:operatedBlobCount:operationSpecificHeader:httpOperationHeader | ||
| * :networkLibrary:operationMetrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 15?
| * Schema: version:clientCorrelationId:clientRequestId:fileSystemId | ||
| * :primaryRequestId:streamId:opType:retryHeader:ingressHandler | ||
| * :position:operatedBlobCount:operationSpecificHeader:httpOperationHeader | ||
| * :networkLibrary:operationMetrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: instaed of operationMetrics, use resourceUtilizationMetrics
| */ | ||
| private long getAvailableHeapMemory() { | ||
| @VisibleForTesting | ||
| public long getAvailableHeapMemory() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Common methods between read and write.
Can be moved to some util class
Introduces new performance metrics in the ABFS driver to monitor and evaluate the effectiveness of read and write aggressiveness tuning. These metrics help in understanding how thread pool behavior, CPU utilization, and heap availability impact overall I/O throughput and latency. By capturing detailed statistics such as active thread count, pool size, and system resource utilization, this enhancement enables data-driven analysis of optimizations made to improve ABFS read and write performance under varying workloads.