Skip to content

Conversation

evanj
Copy link

@evanj evanj commented Feb 15, 2018

This will make certifi work if packed in a zip, such as when running
in an egg, or packaged using Pex or Subpar.

tests/test_zip: Test referencing certifi packaged in an egg.
Fixes #66

This will make certifi work if packed in a zip, such as when running
in an egg, or packaged using Pex or Subpar.

tests/test_zip: Test referencing certifi packaged in an egg.
@evanj
Copy link
Author

evanj commented Feb 15, 2018

The unit test works when run with python setup.py test

_TEST_SCRIPT='; '.join('''import certifi
import os.path
assert os.path.exists(certifi.where()), "certs do not exist"
print "PASS"'''.split('\n'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The python binary could just as easily be linked to python3 in a virtualenv. Please use print("PASS")

@codekitchen
Copy link

I applied this and confirmed that it fixes the AWS Glue use case that I mentioned, as well. Thanks!

I'm not sure it fixes the original issue that #66 was filed for, since it's still using __file__ which seems to be undefined in that "cx freeze" environment? But I have zero experience with freezing a python app so I dunno.

@alex
Copy link
Member

alex commented Feb 15, 2018

I'm still not wild about this, but in any event, there's a race condition between when _certs_tempfile is assigned and when the path points to something readable. Two concurrent callers to where() can result in one of them getting a partial or corrupted trust store.

@evanj
Copy link
Author

evanj commented Feb 15, 2018

I did say on bug #66 that I would accept "no thanks we won't support zip packaged libraries" (since although Python does support this, it isn't that widely used). Great catch on the race condition! Oops!

I've submitted a change that should be python3 friendly, and fixes that issue.

Copy link
Member

@Lukasa Lukasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what’s the plan for addressing the race?

@evanj
Copy link
Author

evanj commented Feb 17, 2018

I believe I did: The solution is not to assign to the global variable until after the certificate content has been written out to disk. Due to the Python GIL, this should be thread safe I believe?

@alex
Copy link
Member

alex commented Feb 17, 2018

There's a second race condition :-)

  1. Two callers enter where(), it's not yet initialized so both go to create a temporary file.
  2. One of them assigns to _certs_tempfile and returns name
  3. The second also succeeds, assigns to _certs_tempfile
  4. At this point the first TemporaryFile object is garbage collected, and I believe the file on disk is deleted
  5. The first where() caller tries to access a non-existent path

@evanj
Copy link
Author

evanj commented Feb 17, 2018

Amazing, you are completely correct. This means we should probably do this at import time since otherwise we need a lock to protect the initialization. Updating the PR now. Thanks.

Since Python has a single import lock, this ensures a single thread
will perform this initialization once per Python process, and calling
where() is still efficient.
@codekitchen
Copy link

Is there more development that needs to be done on this before it is merged in? I'd be happy to help if so, I'd love to see this merged.

@callicles
Copy link

Same here, would love to see this make it to the main line.

def where():
f = os.path.dirname(__file__)
_cert_tempfile = None
_cert_path = os.path.join(os.path.dirname(__file__), 'cacert.pem')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line will fail in the environments that don't have __file__ defined, such as cx_Freeze or PyOxidizer.

The fix is IMHO quite simple - to catch NameError from this line. The rest should work fine - pkgutil.get_data works cross platform.

It would be great to get this into a release. At the moment, it's quite hard to package a single executable depending on e. g. requests library.

# assume we are in a zip: attempt to extract to a temp file
cert_data = pkgutil.get_data(__package__, 'cacert.pem')
if cert_data is not None:
_cert_tempfile = tempfile.NamedTemporaryFile(suffix='.pem')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NamedTemporaryFile delete defaults to True, and this algorithm relies on an open file handle for the duration of the runtime invocation, and breaks if the certifi module is reloaded, as other modules will have the old where() result.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be a fix for this? Making _cert_path a subclass of str and storing a reference to _cert_tempfile there?

Maybe this edge case doesn't need to be solved together with this issue. If the module works for you now, it will continue to work. The only case that breaks is if you're in an evironment where the packages are installed outside of standard filesystem AND you reload certifi.

What is the use case anyway? If the app runs from a zip file or from a compiled EXE, I don't see a need for reloading certifi - the module will likely not change during runtime.

@alex
Copy link
Member

alex commented Jul 15, 2022

I think at this point this has been subsumed by the PRs to use the importlib.resources API. Is this still desired?

@evanj
Copy link
Author

evanj commented Jul 15, 2022

I opened this years ago, but have since stopped working on stuff that uses PEX/eggs, so my answer is: I don't know. I assume that enough things related to Python packaging have changed since I started this, that this is likely not the right approach any more anyway? There has been no activity in two years, so I'm going to take the extreme approach of closing this.

If there are people with more recent experience who know that this is still a problem, please comment and we can attempt to figure out the right new solution.

@evanj evanj closed this Jul 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

"__file__" variable cannot be used when running with frozen program

8 participants