component-level configurable logging for dynamo, inductor, aot #94858

mlazos · 2023-02-14T21:59:50Z

Summary:

Adds NNC-like logging that is configured through an env var TORCH_LOGS
Examples:
TORCH_LOGS="dynamo,guards" python script.py - prints dynamo logs at level INFO with guards of all functions that are compiled

TORCH_LOGS="+dynamo,guards,graph" python script.py - prints dynamo logs at level DEBUG with guards and graphs (in tabular) format of all graphs that are compiled

More examples with full output

Implementation:
The implementation parses the log settings from the environment, finds any components (aot, dynamo, inductor) or other loggable objects (guards, graph, etc.) and generates a log_state object. This object contains all of the enabled artifacts, and a qualified log name -> level mapping. _init_logs then adds handlers to the highest level logs (the registered logs), and sets any artifact loggers to level DEBUG if the artifact is enabled.

Note: set_logs is an alternative for manipulating the log_state, but if the environment contains TORCH_LOGS, the environment settings will be prioritized.

Adding a new log:
To add a new log, a dev should add their log name to torch._logging._registrations (there are examples there already).

Adding a new artifact:
To add a new artifact, a dev should add their artifact name to torch._logging._registrations as well.
Additionally, wherever the artifact is logged, torch._logging.getArtifactLogger(__name__, <artifact_name>) should be used instead of the standard logging implementation.

design doc

cc @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @desertfire

pytorch-bot · 2023-02-14T21:59:53Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/94858

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Failures

As of commit 605088b:

NEW FAILURES - The following jobs have failed:

cuda11.8-py3.10-gcc7-sm86 / test (inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torch/_dynamo/logging.py

williamwen42 · 2023-02-15T00:57:35Z

Are there any configs that can be removed because of this change? e.g. log_level or output_code?

ezyang · 2023-02-15T01:13:23Z

cc @stas00

stas00 · 2023-02-15T01:19:01Z

Thank you, @ezyang for the ping. I have already followed up here:
#94788 (comment)

but let me copy it here:

@mlazos, if it's the way proposed by your PR, please make sure there is actual API to use besides env vars.

env vars are fantastic for developers of the component, but when a 3rd party application/framework needs to control it, env vars become very difficult to use and you have to write API anyway to support those env vars overrides, so giving those to the users would make their code more robust, IMHO.

Additionally, again it looks like your solution is pytorch-developer oriented (which is super-useful). Users need to have a simple blank - cover-all flag, so that they don't need to list out all the possible components.

stas00 · 2023-02-15T01:28:23Z

additionally, this PR invents some sort of new logging level definition semantics, which again looks very neat for devs, but this is not what I think is needed for non-pytorch developers. The proposal here also doesn't allow for the full range of log levels.

The log levels are:

log_levels = {
    "debug": logging.DEBUG,
    "info": logging.INFO,
    "warning": logging.WARNING,
    "error": logging.ERROR,
    "critical": logging.CRITICAL,
}

and ideally should be settable to each of these at will. So perhaps the syntactic sugar can be added to the boring long full definitions, but not replace it.

I did show how we implement these across various projects at HuggingFace, logging.py - I'm not insisting how it should be done here, just showing what appears to work really well.

If you want to add sub-systems to it, perhaps there should be an additional argument that speaks to a specific sub-system. as in:

torch.utils.logging.set_verbosity(all=logging.INFO, dynamo=logging.DEBUG, graph=logging.ERROR)

so most users will just use all, and developers can then override specific sub-system as I have shown above. and this API is future-proof if new sub-systems are added or renamed - just need to use **kwargs in the util definition).

Please let me know if this is any helpful and I'm going in the right direction or not.

Please note that I'm on both sides of the fence - I would like to have a very simple API for users, while allowing for developers to achieve their needs as well.

stas00 · 2023-02-15T01:39:46Z

Also from the description in the OP I don't see it proposing to cover everything, e.g. it doesn't look like torch.distributed is there and it can be pretty noisy on a multi-gpu setup.

When in doubt please always think of someone using 256 gpus and who is going to see the same info line 256 times.

mlazos · 2023-02-15T02:09:02Z

@stas00 Thanks for the suggestions, I think these all seem super useful and are doable.

You're correct that this is not for everything, this is initially for the PyTorch 2.0 components - TorchDynamo, AotAutograd, and TorchInductor. After seeing the RFC and issue I thought maybe we could expand this since currently there isn't a centralized system. Agreed on the torch.distributed piece, haven't thought about that at all too, this would need to be handled before expanding into that domain.

Here's what I gathered from your comments:

We need a user facing API other than env vars - agreed, I can add this, I really like your suggested API.
Full range of log levels - this is taken into account in the component (additional > in front of components to indicate more verbosity), for the user facing API the level can be provided through the kwargs as you showed.
One flag for all on - again agreed, right now I considered TORCH_COMPILE_DEBUG=1 for this but I think that will just be confusing tbh. A component "all" would definitely work for this, and allows enablement through a user-facing API.

mlazos · 2023-02-15T02:15:41Z

Are there any configs that can be removed because of this change? e.g. log_level or output_code?

Yeah good catch, I will remove these

torch/_logging/_internal.py

ezyang

There are some bugs, but the overall structure is good. Land as soon as you can!

mlazos · 2023-03-17T19:09:56Z

There are some bugs, but the overall structure is good. Land as soon as you can!

Thanks! Really appreciated your design feedback!

mlazos · 2023-03-17T20:38:53Z

@pytorchbot merge

pytorchmergebot · 2023-03-17T20:40:48Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-03-17T21:01:05Z

Merge failed

Reason: 1 jobs have failed, first few of them are: linux-binary-manywheel / manywheel-py3_8-cuda11_7-build / build

Details for Dev Infra team

Raised by workflow job

stas00 · 2023-03-17T21:03:56Z

torch/_logging/_internal.py

+
+
+@functools.lru_cache(None)
+def warning_once(self, *args, **kwargs):


so how was this resolved at the end?

Our discussion was wiped out here and I can't find it in the see of resolved discussions on the discussion tab.

The API declares this as a method but it can't be used as a method, no?

This can't work:

def test(self): pass class A: pass a = A() a.test()

Traceback (most recent call last): File "/tmp/test1.py", line 7, in <module> a.test() AttributeError: 'A' object has no attribute 'test'

so the doc is invalid. as it's not identical to logger.warning and it's not a method.

it has to be called as a function warning_once(logger, ...)

and thus has to be imported.

I made a proposal below of one possible way to rectify this

I ended up going with your approach and leaving this is as. I made the call that the ux was better to patch the logger class.

The only con was bad form for patching std lib but I felt like this was okay.

I'll commit your code below, I think it's fine for what's needed. This is kind of standalone thing, let me know if you want to follow up on this in a separate PR, happy to consider other solutions.

Agreed it's an issue with different libraries patching stdlib

the pytorch repo is so big, I worry that I won't get traction on opting everyone into a custom version of get_logger for this one method. I followed the pattern with everything else: have a clear way of getting functionality (ie getArtifactLogger if artifacts are desired, and now a specific warning once functionality when that's desired) and then the basic case of vanilla logging is unmodified.

that works - thank you for handling so many nuances, @mlazos! awesome work!

Thanks!! I really appreciate the time and effort you put into reviewing the document and PR

torch/_logging/_internal.py

Co-authored-by: Stas Bekman <[email protected]>

mlazos · 2023-03-18T00:31:10Z

@pytorchbot merge

pytorchmergebot · 2023-03-18T00:32:58Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-03-18T02:34:39Z

Merge failed

Reason: 1 jobs have failed, first few of them are: inductor / cuda11.8-py3.10-gcc7-sm86 / test (inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu)

Details for Dev Infra team

Raised by workflow job

mlazos · 2023-03-18T04:15:32Z

@pytorchbot merge -f "logs don't affect accuracy"

pytorchmergebot · 2023-03-18T04:17:25Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Summary: Adds NNC-like logging that is configured through an env var `TORCH_COMPILE_LOGS` Examples: `TORCH_LOGS="dynamo,guards" python script.py` - prints dynamo logs at level INFO with guards of all functions that are compiled `TORCH_LOGS="+dynamo,guards,graph" python script.py` - prints dynamo logs at level DEBUG with guards and graphs (in tabular) format of all graphs that are compiled [More examples with full output](https://gist.github.com/mlazos/b17f474457308ce15e88c91721ac1cce) Implementation: The implementation parses the log settings from the environment, finds any components (aot, dynamo, inductor) or other loggable objects (guards, graph, etc.) and generates a log_state object. This object contains all of the enabled artifacts, and a qualified log name -> level mapping. _init_logs then adds handlers to the highest level logs (the registered logs), and sets any artifact loggers to level DEBUG if the artifact is enabled. Note: set_logs is an alternative for manipulating the log_state, but if the environment contains TORCH_LOGS, the environment settings will be prioritized. Adding a new log: To add a new log, a dev should add their log name to torch._logging._registrations (there are examples there already). Adding a new artifact: To add a new artifact, a dev should add their artifact name to torch._logging._registrations as well. Additionally, wherever the artifact is logged, `torch._logging.getArtifactLogger(__name__, <artifact_name>)` should be used instead of the standard logging implementation. [design doc](https://docs.google.com/document/d/1ZRfTWKa8eaPq1AxaiHrq4ASTPouzzlPiuquSBEJYwS8/edit#) Pull Request resolved: pytorch/pytorch#94858 Approved by: https://github.com/ezyang

``` torch._dynamo.config.log_level = logging.INFO torch._dynamo.config.output_code = True ``` were replaced with the module level log control #94858 Pull Request resolved: #109409 Approved by: https://github.com/msaroufim

mlazos added 7 commits February 9, 2023 01:44

Initial filtering impl

0b4776a

Fixes to logging file

0235b6d

Finished most of the initializationlogic

7707d60

Initial impl working

d7dcc30

Add testing infra

b70753a

Initial test skeleton

146c5e3

Fix inheritance + isinstance filtering

5056d6f

mlazos requested review from ezyang, bertmaher, Chillee and williamwen42 February 14, 2023 21:59

github-actions bot added ciflow/inductor module: dynamo labels Feb 14, 2023

mlazos changed the title ~~Consistent, configurable, extensible logging~~ NNC-like configurable logging Feb 14, 2023

mlazos requested a review from albanD February 14, 2023 22:01

mlazos commented Feb 14, 2023

View reviewed changes

torch/_dynamo/logging.py Outdated Show resolved Hide resolved

mlazos added the release notes: dynamo label Feb 14, 2023

mlazos mentioned this pull request Feb 14, 2023

pytorch log level API and env var #94788

Closed

mlazos requested a review from cpuhrsch February 14, 2023 22:15

mlazos changed the title ~~NNC-like configurable logging~~ component-level configurable logging Feb 15, 2023

mlazos changed the title ~~component-level configurable logging~~ component-level configurable logging for dynamo,inductor,aot Feb 15, 2023

mlazos changed the title ~~component-level configurable logging for dynamo,inductor,aot~~ component-level configurable logging for dynamo, inductor, aot Feb 15, 2023

ezyang reviewed Mar 17, 2023

View reviewed changes

torch/_logging/_internal.py Show resolved Hide resolved

ezyang reviewed Mar 17, 2023

View reviewed changes

torch/_logging/_internal.py Show resolved Hide resolved

ezyang reviewed Mar 17, 2023

View reviewed changes

torch/_logging/_internal.py Outdated Show resolved Hide resolved

ezyang reviewed Mar 17, 2023

View reviewed changes

torch/_logging/_internal.py Outdated Show resolved Hide resolved

ezyang approved these changes Mar 17, 2023

View reviewed changes

Fix schedule test

6e51a13

mlazos added 2 commits March 17, 2023 20:07

Various bug fixes and PR comment fixes

ed6f0bf

Fix linting issues, add symbolic shapes logging back

a628277

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 17, 2023

stas00 reviewed Mar 17, 2023

View reviewed changes

torch/_logging/_internal.py Outdated Show resolved Hide resolved

mlazos and others added 2 commits March 17, 2023 17:17

Update torch/_logging/_internal.py

da85598

Co-authored-by: Stas Bekman <[email protected]>

Fix format

605088b

pytorchmergebot added the Merged label Mar 18, 2023

pytorchmergebot closed this in a1c46e5 Mar 18, 2023

JackCaoG mentioned this pull request Sep 15, 2023

Update the instruction to enable dynamo logs #109409

Closed

github-actions bot deleted the mlazos/logging branch September 17, 2024 01:53



		@functools.lru_cache(None)
		def warning_once(self, args, *kwargs):

component-level configurable logging for dynamo, inductor, aot #94858

component-level configurable logging for dynamo, inductor, aot #94858

Uh oh!

Conversation

mlazos commented Feb 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Feb 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/94858

❌ 1 Failures

Uh oh!

Uh oh!

williamwen42 commented Feb 15, 2023

Uh oh!

ezyang commented Feb 15, 2023

Uh oh!

stas00 commented Feb 15, 2023

Uh oh!

stas00 commented Feb 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stas00 commented Feb 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mlazos commented Feb 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mlazos commented Feb 15, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

mlazos commented Mar 17, 2023

Uh oh!

mlazos commented Mar 17, 2023

Uh oh!

pytorchmergebot commented Mar 17, 2023

Merge started

Uh oh!

pytorchmergebot commented Mar 17, 2023

Merge failed

Uh oh!

stas00 Mar 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stas00 Mar 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stas00 Mar 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stas00 Mar 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mlazos Mar 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mlazos Mar 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mlazos Mar 18, 2023

Choose a reason for hiding this comment

Uh oh!

mlazos Mar 18, 2023

Choose a reason for hiding this comment

Uh oh!

stas00 Mar 18, 2023

Choose a reason for hiding this comment

mlazos commented Feb 14, 2023 •

edited

Loading

pytorch-bot bot commented Feb 14, 2023 •

edited

Loading

stas00 commented Feb 15, 2023 •

edited

Loading

stas00 commented Feb 15, 2023 •

edited

Loading

mlazos commented Feb 15, 2023 •

edited

Loading

stas00 Mar 17, 2023 •

edited

Loading

stas00 Mar 17, 2023 •

edited

Loading

stas00 Mar 17, 2023 •

edited

Loading

stas00 Mar 17, 2023 •

edited

Loading

mlazos Mar 17, 2023 •

edited

Loading

mlazos Mar 18, 2023 •

edited

Loading