Skip to content

Conversation

apollo13
Copy link
Member

@apollo13 apollo13 commented May 2, 2020

Follow up to #12550

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: We could raise SupiciousOperation here if the port is empty (ie the hostname ended with :)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since there's no validation here and get_port() is relying on this code I would think you'd need to consider other malformed cases, such as example.com:abc or 1.1.1.1:443]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I am leaning toward calling int on the port part as only validation.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isdigit is better (see https://github.com/django/django/pull/12679/files#diff-0eb6c5000a61126731553169fddb306eR174 ) int is slow and also will take -443 to be valid ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I forgot about that one, let me push a fix for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhm, this is tricky. I am not exactly sure on which level to insert isdigit :( I am leaning towards doing it in get_host & get_port since get_raw_uri should not validate it. I also just realized that the host returned from _parsed_host_header is never used, only the port & reconstructed one -- so I am wondering if I can get away by just returning reconstructed scratches head.

@apollo13 apollo13 force-pushed the ticket31354 branch 2 times, most recently from 3a5c85e to 59bdaf8 Compare May 2, 2020 21:02
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you wanted to keep _get_raw_host

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope not really. All I wanted was to have one function while keeping the behaviour mostly as is and without code duplication.

@apollo13
Copy link
Member Author

apollo13 commented May 7, 2020

@adamchainz Good catch(es). Force-pushed an updated version

Copy link

@dgcgh dgcgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to have a test for the behavior change in _get_raw_uri() but otherwise LGTM.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this changes the semantics of the test, though the test name makes me think it's not actually testing the intended behavior... cookies are not distinguished by port number (https://tools.ietf.org/html/rfc6265#section-8.5) so I would expect this test to have a request with a different port number than the Referer

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, adding :4443 might change semantics -- I'll have to double check (though the actual code in the csrf handling might be faulty in that regard). What I am rather sure though is that the removal of SERVER_PORT two lines below is correct either way since the Host header would override it in any case. So if the tests required that before, they had been wrong :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my changes to this test are correct. The function calling it looks like this:
https://github.com/django/django/blob/9fbac5064df4eb5902c48ec3760c282da6d844f4/tests/csrf_tests/tests.py#L659-L665

Ie the cookie domain is set to .example.com and we check if the request is allowed, even though the site is hosted on port 4443 -- effectively checking that the port doesn't matter.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is now a reconstructed host header, it will be different from the provided host header e.g. in the case the host header has a default port included: Host: example.com:443 this will now be example.com

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, my reconstruction keeps the port if what was specified explictly:
See https://github.com/django/django/blob/9fbac5064df4eb5902c48ec3760c282da6d844f4/django/http/request.py#L112 for retrieving the host/port and https://github.com/django/django/blob/9fbac5064df4eb5902c48ec3760c282da6d844f4/django/http/request.py#L125 for reconstructing it. Or do I miss something?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to match the behavior of _get_raw_host() I think this reconstruction should only occur in the SERVER_NAME scenario.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, _get_raw_host returned the Host-header as is: https://github.com/django/django/blob/master/django/http/request.py#L110

So if _get_raw_host passed in a port it would have been returned again. This is one of the reasons why the patch looks rather tricky.

@apollo13
Copy link
Member Author

apollo13 commented May 7, 2020

@dgcgh Thank you so much for your review on this (imo) mind-boggling patch. I hope I commented the issues enough, please don't hesitate to provide an example test or drill me further with questions.

@apollo13 apollo13 requested a review from adamchainz June 13, 2020 09:00
@apollo13
Copy link
Member Author

@adamchainz Can I get a last review? After that I'll push it to the fellows for a last round of reviews.

@apollo13
Copy link
Member Author

I've pushed another commit which moves parsing and validation into one function. As a side-effect get_port triggers host-header validation now to. This is imo okay sense the port can be part of the host-header or the x-forwarded-for header. I am currently considering splitting the check even more to be able to point out whether the domain name is wrong or whether the port is bogus.

host_validation_re = _lazy_re_compile(r"^([a-z0-9.-]+|\[[a-f0-9]*:[a-f0-9\.:]+\])(:\d+)?$")
host_validation_re = _lazy_re_compile(r"^([a-z0-9.-]+|\[[a-f0-9]*:[a-f0-9\.:]+\])(?::([0-9]+))?$")

ParsedHostHeader = namedtuple('ParsedHostHeader', ['domain', 'port', 'combined'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

namedtuples are weird (indexing syntax) and slow to create (it uses eval internally). Can you make a vanilla class, or use dataclasses (not sure if django 3.2 will use python 3.7+ though)?

@adamchainz
Copy link
Member

As a side-effect get_port triggers host-header validation now to.

This also seems reasonable to me.

@felixxm
Copy link
Member

felixxm commented Nov 13, 2020

@apollo13 Is it ready for another round of reviews? If yes I can start working on it.

Base automatically changed from master to main March 9, 2021 06:21
@felixxm
Copy link
Member

felixxm commented Jun 24, 2021

Closing due to inactivity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants