polyloader/docs/_sources/details.txt

113 lines
4.8 KiB
Plaintext
Raw Normal View History

2016-09-03 04:12:04 +00:00
Details on Import and Polyloader
================================
Welcome to the Python Import ELI5!
**********************************
What is `import`?
-----------------
``import`` is a Python statement that finds a *module* that is accessible
by the currently running process, loads it, and makes its contents
available to the scope in which the statement was made. A statement
like
``import django.db.models``
is looking for a module named ``django.db.models``. If the statement is
successful, ``django.db.models`` will be a variable name in the scope,
it will be a Python *object* (of ``class Module``, but that's not
important), and it will have functions, class constructors, variables
and constants. These Python objects will be accessible through the dot
operator.
An alternative way of writing import statements is
``from django.utils import encoding``
And then the variable will just be ``encoding``. The ``encoding``
module has a function for handling unicode-to-web translation.
Accessing it through the dot operator, it looks like this:
``ready_str = encoding.smart_str(unready_str)``
We call the parts of the import statement that describe the module the
*import string*.
``sys.path_hooks``: How does Python know where to look?
-------------------------------------------------------
That's what's funny. Python has two independent ways of making sense of
of the import string. The old system is based on the assumption that
everything is a filesystem, with folders and filenames. This is called
the ``sys.path_hooks`` system.
In the old system, the parts of the import string would be split up, and
then a collection of directories would be scanned to see if the first
name in the import string could be matched with a subdirectory. If it
could, that directory would be scanned until the last name on the import
string. If that name was a *filename*, it would be loaded as a module.
If that name was a *directory* and that directory had a file named
``__init__.py``, then that file would be loaded as the module.
The ``sys.path_hooks`` array has a list of different methods for trying to
scan a filesystem for the parts of the import string. A ``path_hook`` is
a function that takes a path to a directory; if it can handle the
contents of the directory, it returns a **Finder**, an object whose job
is to figure out how to load the module; if it can't, it returns an
ImportError exception. The object that loads the module is called,
naturally, a **Loader**.
* To read more about **Finders**, see :ref:`eli5-finders`
* To read more about **Loaders**, see :ref:`eli5-loaders`
* To read more about **Path Hooks**, see :ref:`eli5-pathhooks`
``sys.path``: What directories are searched?
--------------------------------------------
The list of directories is stored in an array, ``sys.path``. This path is
initialized by Python when it starts up, but programs can modify it at
run-time to point to extra directories if they want.
``sys.meta_path``: What is the new system?
------------------------------------------
The new system is called ``sys.meta_path``, and it's an array of
**Finders**, objects that have one method, ``find_module(fullname)``.
It's an anything-goes API that gives developers the freedom to import
modules from anywhere: databases, archives, remote web resources, even
code written on-the-fly internally. The new system can apply any
meaning at all to the import string.
In Python, the import string is offered to each object in
``sys.meta_path`` before being offered to each ``sys.path_hook``. The
filesystem is typically the last finder tried.
To read more about **Meta Paths**, see :ref:`eli5-metapaths`
Is it different between Python 2 and Python 3?
---------------------------------------------
Python 3 moves almost everything about this process into python's
library, leaving only a bare minimum of functionality inside the Python
executable to load this library and run it. When the Python developers
did that, they added a lot of functionality to make it easier to write
new import modules. The old way still works, but there are now *Module
Specifications*, which are metadata about a module, and the old
``path_hooks`` system is now just a ``meta_path`` handler added to the
new system as the last resort.
To read more about **Module Specifications**, see :ref:`eli5-specs`
Does the old system still matter?
---------------------------------
Yes, for one reason: *iteration*. Iteration is the ability to take a
path where you believe Python modules can be found, and list through
them. This facility is useful for large frameworks where a user wants
to add new commands, or new objects, or new operations; Django uses this
facility a lot! The ``pkgutil`` library depends upon Finders being able
2016-09-03 20:22:44 +00:00
to iterate their contents, and with the filesystem iterator, that means
being able to tell there's more than one kind of syntax in a directory.