113 lines
4.8 KiB
Plaintext
113 lines
4.8 KiB
Plaintext
Details on Import and Polyloader
|
|
================================
|
|
|
|
Welcome to the Python Import ELI5!
|
|
**********************************
|
|
|
|
What is `import`?
|
|
-----------------
|
|
|
|
``import`` is a Python statement that finds a *module* that is accessible
|
|
by the currently running process, loads it, and makes its contents
|
|
available to the scope in which the statement was made. A statement
|
|
like
|
|
|
|
``import django.db.models``
|
|
|
|
is looking for a module named ``django.db.models``. If the statement is
|
|
successful, ``django.db.models`` will be a variable name in the scope,
|
|
it will be a Python *object* (of ``class Module``, but that's not
|
|
important), and it will have functions, class constructors, variables
|
|
and constants. These Python objects will be accessible through the dot
|
|
operator.
|
|
|
|
An alternative way of writing import statements is
|
|
|
|
``from django.utils import encoding``
|
|
|
|
And then the variable will just be ``encoding``. The ``encoding``
|
|
module has a function for handling unicode-to-web translation.
|
|
Accessing it through the dot operator, it looks like this:
|
|
|
|
``ready_str = encoding.smart_str(unready_str)``
|
|
|
|
We call the parts of the import statement that describe the module the
|
|
*import string*.
|
|
|
|
``sys.path_hooks``: How does Python know where to look?
|
|
-------------------------------------------------------
|
|
|
|
That's what's funny. Python has two independent ways of making sense of
|
|
of the import string. The old system is based on the assumption that
|
|
everything is a filesystem, with folders and filenames. This is called
|
|
the ``sys.path_hooks`` system.
|
|
|
|
In the old system, the parts of the import string would be split up, and
|
|
then a collection of directories would be scanned to see if the first
|
|
name in the import string could be matched with a subdirectory. If it
|
|
could, that directory would be scanned until the last name on the import
|
|
string. If that name was a *filename*, it would be loaded as a module.
|
|
If that name was a *directory* and that directory had a file named
|
|
``__init__.py``, then that file would be loaded as the module.
|
|
|
|
The ``sys.path_hooks`` array has a list of different methods for trying to
|
|
scan a filesystem for the parts of the import string. A ``path_hook`` is
|
|
a function that takes a path to a directory; if it can handle the
|
|
contents of the directory, it returns a **Finder**, an object whose job
|
|
is to figure out how to load the module; if it can't, it returns an
|
|
ImportError exception. The object that loads the module is called,
|
|
naturally, a **Loader**.
|
|
|
|
* To read more about **Finders**, see :ref:`eli5-finders`
|
|
* To read more about **Loaders**, see :ref:`eli5-loaders`
|
|
* To read more about **Path Hooks**, see :ref:`eli5-pathhooks`
|
|
|
|
``sys.path``: What directories are searched?
|
|
--------------------------------------------
|
|
|
|
The list of directories is stored in an array, ``sys.path``. This path is
|
|
initialized by Python when it starts up, but programs can modify it at
|
|
run-time to point to extra directories if they want.
|
|
|
|
|
|
``sys.meta_path``: What is the new system?
|
|
------------------------------------------
|
|
|
|
The new system is called ``sys.meta_path``, and it's an array of
|
|
**Finders**, objects that have one method, ``find_module(fullname)``.
|
|
It's an anything-goes API that gives developers the freedom to import
|
|
modules from anywhere: databases, archives, remote web resources, even
|
|
code written on-the-fly internally. The new system can apply any
|
|
meaning at all to the import string.
|
|
|
|
In Python, the import string is offered to each object in
|
|
``sys.meta_path`` before being offered to each ``sys.path_hook``. The
|
|
filesystem is typically the last finder tried.
|
|
|
|
To read more about **Meta Paths**, see :ref:`eli5-metapaths`
|
|
|
|
Is it different between Python 2 and Python 3?
|
|
---------------------------------------------
|
|
|
|
Python 3 moves almost everything about this process into python's
|
|
library, leaving only a bare minimum of functionality inside the Python
|
|
executable to load this library and run it. When the Python developers
|
|
did that, they added a lot of functionality to make it easier to write
|
|
new import modules. The old way still works, but there are now *Module
|
|
Specifications*, which are metadata about a module, and the old
|
|
``path_hooks`` system is now just a ``meta_path`` handler added to the
|
|
new system as the last resort.
|
|
|
|
To read more about **Module Specifications**, see :ref:`eli5-specs`
|
|
|
|
Does the old system still matter?
|
|
---------------------------------
|
|
|
|
Yes, for one reason: *iteration*. Iteration is the ability to take a
|
|
path where you believe Python modules can be found, and list through
|
|
them. This facility is useful for large frameworks where a user wants
|
|
to add new commands, or new objects, or new operations; Django uses this
|
|
facility a lot! The ``pkgutil`` library depends upon Finders being able
|
|
to iterate their contents, and with the filesystem iterator, that means
|
|
being able to tell there's more than one kind of syntax in a directory.
|