Skip to content

Conversation

@ilevkivskyi
Copy link
Member

This adds static support for structural subtyping. Previous discussion is here python/typing#11

Fixes #222

Copy link
Contributor

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments based on a quick read-through.

pep-0544.txt Outdated
provide sophisticated runtime instance and class checks against protocol
classes. This would be difficult and error-prone and will contradict the logic
of PEP 484. As well, following PEP 484 and PEP 526 we state that protocols are
**completely optional** and there is no intent to make them required.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little unclear what 'completely optional' means here and whether the following sentences are related to this. This could perhaps mean some of these things:

  • They are optional like everything defined in PEP 484: They don't affect Python runtime semantics and they are merely a feature in typing. Programmers are free to not use them.
  • The standard library doesn't use protocols outside typing.
  • Programmers are free to not use them even if they use type annotations.
  • Python implementations don't need to support them.

This could rephrased like this to avoid the ambiguity:

... are completely optional:

  • No runtime semantics will be imposed for variables or parameters annotated with a protocol class.
  • Any checks will be performed only by third-party type checkers and other tools.
  • ... maybe more

There is no intent to make protocols non-optional in the future.

a compatible type signature.


Protocol members
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to imply that sequences no longer support iteration, since Sequence.__iter__ is not abstract and thus is not a protocol member. Is this intentional?

My view is that protocols should not have non-protocol attributes or methods. For example, none of the ABCs in collections.abc seem to have what I'd consider non-protocol members. Protocols should describe an interface, not an implementation. Default implementations are okay, but they should probably only use other members defined in the protocol.

If we want to support non-protocol members, I think we should restrict them to names with a double underscore prefix to make this super explicit. However, even this brings in some arguably unneeded complexity: self will now have a special type, since accessing non-protocol members is only safe through self. Example:

class C(Protocol):
    ...
    def f(self, other: 'C') -> None:
        self.__g()  # ok
        other.__g()  # not ok

    def __g(self) -> None:
        <something>

Copy link
Member Author

@ilevkivskyi ilevkivskyi Mar 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to imply that sequences no longer support iteration, since Sequence.__iter__ is not abstract and thus is not a protocol member. Is this intentional?

Sequence implements __iter__ therefore it will be accepted wherever Iterable is expected. But asking for __iter__ on Sequence indeed will fail.

This indeed looks problematic. Here is a solution that I could propose: we do not have such thing as non-protocol members. However, there will be two kinds of members, abstract and non-abstract. All of them should be implemented in order to consider a class as structural (implicit) subtype.

The reason to have non-abstract methods is that one can get them "for free" using explicit subclassing provided abstract methods are implemented in subclass. So that for larger protocols like Sequence or Mapping one doesn't need to do

def method(self):
    return super().method()

for every method.

This coincides with how ABCs work at runtime, so that this will be intuitive for someone who already worked with ABCs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Protocols should describe an interface, not an implementation.

👍

Default implementations are okay, but they should probably only use other members defined in the protocol.

👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a solution that I could propose: we do not have such thing as non-protocol members.

That's now specified below in the PEP right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's now specified below in the PEP right?

Yes.

pep-0544.txt Outdated
Generic protocol types follow the same rules of variance as non-protocol
types. Protocols can be used in all special type constructors provided
by the ``typing`` module and follow the corresponding subtyping rules.
For example, protocol ``PInt`` is a subtype of ``Union[PInt, int]``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unclear. What is PInt?

pep-0544.txt Outdated

Generic protocol types follow the same rules of variance as non-protocol
types. Protocols can be used in all special type constructors provided
by the ``typing`` module and follow the corresponding subtyping rules.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be rephrased as something like "Protocol types can be used in all contexts where any other types can used, such as in (examples). Generic protocols follow the rules for generic abstract classes, except for using structural compatibility instead of compatibility defined by inheritance relationships."

pep-0544.txt Outdated
finish(GoodJob()) # OK

In addition, we propose to add another special type construct
``All`` that represents intersection types. Although for normal types
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd consider leaving out All from this proposal, at least initially. This is something we can add later if there is a need, but currently I don't see evidence for this being important. Effectively the same behavior can be achieved through multiple inheritance. Besides, it would be a bit arbitrary if All only worked with protocols, and making All work with arbitrary types would be a much more complex feature, and not as directly related to this PEP.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could include this for three reasons:

  • I could easily imagine a function parameter that requires more than one protocol, especially if the preferred style is to have several simple protocols, rather than few complex. By the way, some classes in collections.abc are simple intersections, for example Collection == All[Sized, Iterable, Container].
  • It could be easily implemented for protocols. I also propose to use All instead of Intersection to emphasize it only works for protocols (mnemonics: variable implements all these protocols). On the contrary, I think that intersections of nominal classes would be very rarely used.
  • If we decide to include this later, then people might simply be not aware of this feature (this is a minor reason, but still we need to take this into account).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Jukka -- I think it's confusing for All to work only with protocols, extending All to work with all types would add considerable complexity, and I'm not convinced there's a pressing need.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was pushing for Intersection for the original PEP because I saw the need for being able to intersect ABCs. Given that protocols themselves solve this, and we can use multiple inheritance as Jukka is suggesting, I agree that it's wiser to leave this out.

I would still like to have it, but that can be a separate PEP. I agree it's complex to implement for concrete types, and better yet, for combinations of both.

pep-0544.txt Outdated
(``Iterable`` is statically equivalent to ``Iterable[Any]`).
The metaclass of ``typing.Protocol`` will automatically add
a ``__subclasshook__()`` that provides the same semantics
for class and instance checks as for ``collections.abc`` classes::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love the automatic isinstance check support, even though that's how the mypy Protocol class used to work. I don't think that they can be made to work reliably for arbitrary ABCs. Consider this example:

class P(Protocol):
    @abstractmethod
    def common_method_name(self, x: int) -> int: ... 

class X:
    <a bunch of methods>
    def common_method_name(self) -> None: ...

def do_stuff(o: Union[P, X]) -> int:
    if isinstance(o, P):
        return o.common_method_name(1)  # oops what if it's an X instance instead?
    else:
        return (do something with an X)

Maybe a type checker can warn about this example, but more generally the isinstance check could be unreliable, except for things where there is a common signature convention such as Iterable.

Another potentially problematic case:

class P(Protocol):
    x: int

class C:
    def initialize(self) -> None:
        self.x = 0

c1 = C()
c2 = C()
c2.initialize()
isinstance(c1, P)  # False
isinstance(c2, P)  # True

def f(x: Union[P, int]) -> None:
    if isinstance(x, P):
        # type of x is P here
        ...
    else:
        # type of x is int here?
        print(x + 1)

f(C())   # oops

Again, maybe we can live with this, but this could be pretty surprising. I'd argue that requiring an explicit class decorator would be better, since we can then attach warnings about problems like this in the documentation. The programmer would be able to evaluate whether the benefits outweigh the potential for confusion for each protocol and explicitly opt in -- but the default behavior would be safer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue that requiring an explicit class decorator would be better, since we can then attach warnings about problems like this in the documentation.

This is how I initially wrote this, since I intuitively felt that this could not be 100% reliable. Guido wanted to make it default, by now we have good examples to make this opt-in, so that I will revert this to class decorator.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand how a class decorator would solve the problem. Wouldn't that mean that some classes that implement the protocol (i.e. the decorated ones) will work properly with isinstance at runtime but others will not?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

classes that implement the protocol (i.e. the decorated ones)

The decorator is applied to protocol, not to implementation. This was in a previous draft. At runtime the decorator will add same __subclasshook__() to a protocol, that collections.abc classes have now. As Jukka pointed out, there are pros and cons of such approach, and a user could decide.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would the __subclasshook__ work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, if a class has all necessary attributes, then it is a subclass. This is why I mention below that "type checkers will narrow types after such checks by the type erased Proto (i.e. with all variables having type Any and all methods having type Callable[..., Any])" because we cannot reliably check the types of attributes at runtime.

This is how it is done in collections.abc see https://github.com/python/cpython/blob/master/Lib/_collections_abc.py#L72 and e.g. https://github.com/python/cpython/blob/master/Lib/_collections_abc.py#L93

pep-0544.txt Outdated
# Error! Can't use 'isinstance()' with subscripted protocols


Using Protocols in Python 3.5 and Earlier
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about Python 2.7?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be used the same way for 2.7 - 3.5, modulo the fact that function type comments should be used in 2.7.

pep-0544.txt Outdated
reasons:

* Protocols don't have some properties of regular classes. In particular,
``isinstance()`` is not always well-defined for protocols, whereas it is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better argument would be something like this: isinstance, as defined for classes that don't explicitly extend Protocol, is based on the nominal hierarchy. In order to make everything a protocol by default and have isinstance work would require changing the semantics of isinstance, which won't happen.

mainly because their interfaces are too large, complex or
implementation-oriented (for example, they may include de facto
private attributes and methods without a ``__`` prefix).
* Most actually useful protocols in existing Python code seem to be implicit.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional reason: Many built-in functions only accept concrete instances of int (and subclass instances), and similarly for other built-in classes. Making int a structural type wouldn't be safe without major changes to the Python runtime, which won't happen.

References
==========

.. [typing]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it customary to include references to discussions that have happened about the PEP, such as in the typing issue tracker?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a good idea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could do more deep linking to specific comments or subthreads? (No big deal if you think it's too much effort, but it might save some readers some time finding the right part of the long discussion in e.g. python/typing#11 -- including myself :-).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will do this.

@ilevkivskyi
Copy link
Member Author

@JukkaL Thank you for helpful comments! I agree with almost all things. I will make corresponding changes in the text soon. The only thing where we do not agree is All, I still think we should have it.

Also we probably need to find a way how to indicate whether an abstract method has a default implementation or not in stubs.

@ddfisher
Copy link

As a general comment, I don't think it's a good idea to standardize this before having an implementation. An implementation would allow us to understand what's difficult to implement, what sorts of unexpected interactions it has with other type-system features, what parts are confusing, and what parts are practically useful. I think it will be very difficult to get an understanding of those things otherwise.

pep-0544.txt Outdated
-----------------------------------

To explicitly declare that a certain class implements the given protocols, they
can be used as regular base classes. In this case:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried this could be pretty confusing: you could have a class that explicitly implements some protocols and implicitly implements others. I don't know of any other language that allows this.

pep-0544.txt Outdated
finish(GoodJob()) # OK

In addition, we propose to add another special type construct
``All`` that represents intersection types. Although for normal types

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Jukka -- I think it's confusing for All to work only with protocols, extending All to work with all types would add considerable complexity, and I'm not convinced there's a pressing need.


Protocols are essentially anonymous. To emphasize this point, static type
checkers might refuse protocol classes inside ``NewType()`` to avoid an
illusion that a distinct type is provided::

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this restriction makes sense, actually -- trying to construct a NewType from a protocol isn't cogent. You probably want an alias instead. Having NewType create a non-protocol subtype is surprising behavior and is not consistent with how NewType works in general, in my opinion. We shouldn't support this without a clear need.

pep-0544.txt Outdated
(``Iterable`` is statically equivalent to ``Iterable[Any]`).
The metaclass of ``typing.Protocol`` will automatically add
a ``__subclasshook__()`` that provides the same semantics
for class and instance checks as for ``collections.abc`` classes::

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand how a class decorator would solve the problem. Wouldn't that mean that some classes that implement the protocol (i.e. the decorated ones) will work properly with isinstance at runtime but others will not?

be provided at runtime.


Changes in the typing module

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you propose to handle the backwards-incompatibility problem for all currently released versions of Python?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that there should be no backward incompatibility. The classes in typing module (as well as in collections.abc) already behave like protocols at runtime:

class Whatever:
    def __iter__(self): return []
assert isinstance(Whatever(), Iterable)

as I mention below "Practically, few changes will be needed in typing since some of these classes already behave the necessary way at runtime", most of these will be changed only in typeshed.

Also, after I implement the changes proposed by Jukka, all code that passes mypy now, will still pass (unless I don't see some subtleties).

@ilevkivskyi
Copy link
Member Author

@ddfisher Of course I am not going to wait until this PEP is accepted :-)
I think implementation and discussion of specification should go in parallel. This is a big feature, but I hope I will make a first PR quite soon.

@brettcannon
Copy link
Member

@ilevkivskyi when you're ready for the first version to go in just let us know and we will commit it. Then you can have discussions go to your personal fork and just send PRs for updates as needed.