-
-
Notifications
You must be signed in to change notification settings - Fork 33.1k
Fixed #31354: Properly reconstruct host header in presence of X-Forwarded-* header. #12844
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
django/http/request.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: We could raise SupiciousOperation here if the port is empty (ie the hostname ended with :
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since there's no validation here and get_port()
is relying on this code I would think you'd need to consider other malformed cases, such as example.com:abc
or 1.1.1.1:443]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I am leaning toward calling int
on the port part as only validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isdigit
is better (see https://github.com/django/django/pull/12679/files#diff-0eb6c5000a61126731553169fddb306eR174 ) int
is slow and also will take -443
to be valid ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I forgot about that one, let me push a fix for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mhm, this is tricky. I am not exactly sure on which level to insert isdigit
:( I am leaning towards doing it in get_host
& get_port
since get_raw_uri
should not validate it. I also just realized that the host
returned from _parsed_host_header
is never used, only the port
& reconstructed
one -- so I am wondering if I can get away by just returning reconstructed
scratches head.
3a5c85e
to
59bdaf8
Compare
django/http/request.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought you wanted to keep _get_raw_host
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope not really. All I wanted was to have one function while keeping the behaviour mostly as is and without code duplication.
@adamchainz Good catch(es). Force-pushed an updated version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice to have a test for the behavior change in _get_raw_uri()
but otherwise LGTM.
tests/csrf_tests/tests.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this changes the semantics of the test, though the test name makes me think it's not actually testing the intended behavior... cookies are not distinguished by port number (https://tools.ietf.org/html/rfc6265#section-8.5) so I would expect this test to have a request with a different port number than the Referer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, adding :4443
might change semantics -- I'll have to double check (though the actual code in the csrf handling might be faulty in that regard). What I am rather sure though is that the removal of SERVER_PORT
two lines below is correct either way since the Host
header would override it in any case. So if the tests required that before, they had been wrong :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my changes to this test are correct. The function calling it looks like this:
https://github.com/django/django/blob/9fbac5064df4eb5902c48ec3760c282da6d844f4/tests/csrf_tests/tests.py#L659-L665
Ie the cookie domain is set to .example.com
and we check if the request is allowed, even though the site is hosted on port 4443
-- effectively checking that the port doesn't matter.
django/http/request.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is now a reconstructed host header, it will be different from the provided host header e.g. in the case the host header has a default port included: Host: example.com:443
this will now be example.com
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, my reconstruction keeps the port if what was specified explictly:
See https://github.com/django/django/blob/9fbac5064df4eb5902c48ec3760c282da6d844f4/django/http/request.py#L112 for retrieving the host/port and https://github.com/django/django/blob/9fbac5064df4eb5902c48ec3760c282da6d844f4/django/http/request.py#L125 for reconstructing it. Or do I miss something?
django/http/request.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to match the behavior of _get_raw_host()
I think this reconstruction should only occur in the SERVER_NAME
scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, _get_raw_host
returned the Host
-header as is: https://github.com/django/django/blob/master/django/http/request.py#L110
So if _get_raw_host
passed in a port it would have been returned again. This is one of the reasons why the patch looks rather tricky.
@dgcgh Thank you so much for your review on this (imo) mind-boggling patch. I hope I commented the issues enough, please don't hesitate to provide an example test or drill me further with questions. |
@adamchainz Can I get a last review? After that I'll push it to the fellows for a last round of reviews. |
I've pushed another commit which moves parsing and validation into one function. As a side-effect |
host_validation_re = _lazy_re_compile(r"^([a-z0-9.-]+|\[[a-f0-9]*:[a-f0-9\.:]+\])(:\d+)?$") | ||
host_validation_re = _lazy_re_compile(r"^([a-z0-9.-]+|\[[a-f0-9]*:[a-f0-9\.:]+\])(?::([0-9]+))?$") | ||
|
||
ParsedHostHeader = namedtuple('ParsedHostHeader', ['domain', 'port', 'combined']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
namedtuple
s are weird (indexing syntax) and slow to create (it uses eval
internally). Can you make a vanilla class, or use dataclasses (not sure if django 3.2 will use python 3.7+ though)?
This also seems reasonable to me. |
@apollo13 Is it ready for another round of reviews? If yes I can start working on it. |
Closing due to inactivity. |
Follow up to #12550