Skip to content

Conversation

GappleBee
Copy link
Contributor

@GappleBee GappleBee commented Nov 21, 2024

Trac ticket number

ticket-31354

Branch description

Continued #12844
I've rebased it and made the following tweaks:

  • Removed unnecessary regex compilations
  • Replaced .format() statements with f-strings
  • Updated ExceptionReporter._get_raw_insecure_uri
  • Fixed test cases

Checklist

  • This PR targets the main branch.
  • The commit message is written in past tense, mentions the ticket number, and ends with a period.
  • I have checked the "Has patch" ticket flag in the Trac system.
  • I have added or updated relevant tests.
  • I have added or updated relevant docs, including release notes if applicable.

Copy link
Contributor

@laymonage laymonage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! My review is part of the Djangonaut Space Reviewers program.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file has been moved to requests_tests/tests.py in 016bead. Could you rebase the PR again and ensure the changes to this file are carried over to the new file instead? (Then make sure this file is deleted.)

r"^([a-z0-9.-]+|\[[a-f0-9]*:[a-f0-9.:]+\])(?::([0-9]+))?$"
)
MAX_ALLOWED_FILES = 1000
host_validation_re = r"^([a-z0-9.-]+|\[[a-f0-9]*:[a-f0-9\.:]+\])(?::([0-9]+))?$"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was the _lazy_re_compile removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mainly because this regex pattern is used only once so there's no need to lazily compile it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might only be used once per request, at least in the standard request-response flow, but I've seen split_domain_port be called elsewhere. The laziness keeps the regex compiled between requests, which can massively improve performance.

Arguably, I could see value in compiling it proactively, rather than lazily.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, that makes sense. I think compiling it proactively is better as host validation is a key part of each request so will be used a lot


return parsed_host

def get_host(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we update the documentation of get_host/get_port to include the new behaviour? Maybe also the docs for USE_X_FORWARDED_HOST/USE_X_FORWARDED_PORT

  • .. method:: HttpRequest.get_host()
    Returns the originating host of the request using information from the
    ``HTTP_X_FORWARDED_HOST`` (if :setting:`USE_X_FORWARDED_HOST` is enabled)
    and ``HTTP_HOST`` headers, in that order. If they don't provide a value,
    the method uses a combination of ``SERVER_NAME`` and ``SERVER_PORT`` as
    detailed in :pep:`3333`.
    Example: ``"127.0.0.1:8000"``
    Raises ``django.core.exceptions.DisallowedHost`` if the host is not in
    :setting:`ALLOWED_HOSTS` or the domain name is invalid according to
    :rfc:`1034`/:rfc:`1035 <1035>`.
    .. note:: The :meth:`~HttpRequest.get_host()` method fails when the host is
    behind multiple proxies. One solution is to use middleware to rewrite
    the proxy headers, as in the following example::
    class MultipleProxyMiddleware:
    FORWARDED_FOR_FIELDS = [
    "HTTP_X_FORWARDED_FOR",
    "HTTP_X_FORWARDED_HOST",
    "HTTP_X_FORWARDED_SERVER",
    ]
    def __init__(self, get_response):
    self.get_response = get_response
    def __call__(self, request):
    """
    Rewrites the proxy headers so that only the most
    recent proxy is used.
    """
    for field in self.FORWARDED_FOR_FIELDS:
    if field in request.META:
    if "," in request.META[field]:
    parts = request.META[field].split(",")
    request.META[field] = parts[-1].strip()
    return self.get_response(request)
    This middleware should be positioned before any other middleware that
    relies on the value of :meth:`~HttpRequest.get_host()` -- for instance,
    :class:`~django.middleware.common.CommonMiddleware` or
    :class:`~django.middleware.csrf.CsrfViewMiddleware`.
    .. method:: HttpRequest.get_port()
    Returns the originating port of the request using information from the
    ``HTTP_X_FORWARDED_PORT`` (if :setting:`USE_X_FORWARDED_PORT` is enabled)
    and ``SERVER_PORT`` ``META`` variables, in that order.
  • django/docs/ref/settings.txt

    Lines 3026 to 3052 in 0fe2188

    .. setting:: USE_X_FORWARDED_HOST
    ``USE_X_FORWARDED_HOST``
    ------------------------
    Default: ``False``
    A boolean that specifies whether to use the ``X-Forwarded-Host`` header in
    preference to the ``Host`` header. This should only be enabled if a proxy
    which sets this header is in use.
    This setting takes priority over :setting:`USE_X_FORWARDED_PORT`. Per
    :rfc:`7239#section-5.3`, the ``X-Forwarded-Host`` header can include the port
    number, in which case you shouldn't use :setting:`USE_X_FORWARDED_PORT`.
    .. setting:: USE_X_FORWARDED_PORT
    ``USE_X_FORWARDED_PORT``
    ------------------------
    Default: ``False``
    A boolean that specifies whether to use the ``X-Forwarded-Port`` header in
    preference to the ``SERVER_PORT`` ``META`` variable. This should only be
    enabled if a proxy which sets this header is in use.
    :setting:`USE_X_FORWARDED_HOST` takes priority over this setting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the get_port method seems to have changed in behaviour but otherwise the docs are still accurate

Comment on lines -338 to +339
host=self.request._get_raw_host(),
path=self.request.get_full_path(),
)
return self.request.get_raw_uri()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this change was from the previous PR, but before we merge this, this should probably be in a separate commit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any particular reason for this? All commits are going to be squashed into one before it's merged

Copy link
Contributor

@ontowhee ontowhee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, and thank you for picking up this patch! We reviewed this code as part of Space Reviewers. I have some comments.

MultiPartParserError,
TooManyFilesSent,
)
from django.http.multipartparser import MultiPartParser, MultiPartParserError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider changing the namedtuple to another data structure, as Adam mentioned in his comment, #12844 (comment)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree with him as we don't need the indexing functionality, for the sake of backwards compatibility, I think we should go for a vanilla class

Comment on lines +802 to +843
if host_header[-1] == "]":
# It's an IPv6 address without a port.
return host_header, ""
bits = host_header.rsplit(":", 1)
return tuple(bits) if len(bits) == 2 else (bits[0], "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider validating the port that is parsed in _parse_host_header(), as mentioned in a previous comment #12844 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems _parse_host_header intentionally doesn't validate the port so I put the port validation at the next lowest level (_get_parsed_host_header). Not sure whether to allow port 0 as it's a reserved port but I disallowed it for now

Copy link
Contributor

@raffaellasuardini raffaellasuardini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this.

"HTTP_X_FORWARDED_PORT": "8080",
}
# Should use the X-Forwarded-Port header
with self.assertRaises(ImproperlyConfigured):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with self.assertRaises(ImproperlyConfigured):
with self.assertRaisesMessage(ImproperlyConfigured):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What message do you think should come with the exception here?

host_validation_re = _lazy_re_compile(
r"^([a-z0-9.-]+|\[[a-f0-9]*:[a-f0-9.:]+\])(?::([0-9]+))?$"
)
MAX_ALLOWED_FILES = 1000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this constant introduced? Is this related to the ticket?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, not sure why I introduced it as looking closer at parse_file_upload, it already handles checking for the number of files


if use_x_fw_port and "HTTP_X_FORWARDED_PORT" in self.META:
if port_in_x_fw_host:
raise ImproperlyConfigured(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: This will result in an exception (and HTTP 500) if someone passes a malformed X-Forwarded-Host header. Perhaps this should just respond with a HTTP 400 instead - similar to the ALLOWED_HOSTS check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, it's the client's fault if they pass a malformed X-Forwarded-Host header so it should be a bad request error instead of a server configuration error

@GappleBee
Copy link
Contributor Author

Thank you @laymonage @RealOrangeOne @ontowhee @raffaellasuardini for looking into this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants