polyloader/docs/blog2.md

155 lines
6.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

In the last post, I introduced the concepts of the **module object**,
**module**, and **package**, concrete objects that exist within the
Python runtime, as well as some basic ideas about packaging, finding,
and loading.
In this post, I'll go over the process of *finding*, what it means to
*find* something, and what happens next.
## A Clarifying point
I've been very careful to talk about *finding* vs. *loading*
vs. *listing* in this series of posts. There's a reason for that: in
Python 2, the terms "Finder" and "Importer" were used interchangeably,
leading to (at least on my part) massive confusion. In actual fact,
finders, hooks, loaders, and listers are all individual objects, each
with a single, unique method with a specific signature. The method name
is different for each stage, so it is theoretically possible to define a
single class that does all three for a given category of *module
object*, and only in that case, I believe, should we talk about an
"Importer."
In Python 2.6 and 2.7, the definitive Finder class is called
`pkgutil.ImpImporter`, and the Loader is called `pkgutil.ImpLoader`;
this was a source of much of my confusion. In Python 3, the term
"Importer" is deprecated and "Finder" is used throughout `importlib`. I
will be using "Finder" from now on.
## Finding
When the 'import <fullname>' command is called, a procedure is
triggered. That procedure then:
* attempts to *find* a corresponding python *module*
* attempts to *load* that corresponding module into *bytecode*
* Associates the bytecode with the name via sys.modules[fullname]
* Exposes the bytecode to the calling scope.
* Optionally: writes the bytecode to the filesystem for future use
*Finding* is the act of identifying a resource that corresponds to the
import string that can be compiled into a meaningful Python module. The
import string is typically called the *fullname*.
*Finding* typically involves scanning a collection of *resources*
against a collection of *finders*. *Finding* ends when *finder `A`*,
given *fullname `B`*, reports that a corresponding module can be found
in *resource `C`*, and that the resource can be loaded with *loader
`D`*."
### MetaFinders
*Finders* come first, and *MetaFinders* come before all other kinds of
finders.
_Most finding is done in the context of `sys.path`_; that is, Python's
primary means of organizing Python modules is to have them somewhere on
the local filesystem. This makes sense. Sometimes, however, you want
to get in front of that scan and impose your own logic: you want the
root of an import string to mean something else. Maybe instead of
`directory.file`, you want it to mean `table.row.cell`, or you want it
to mean `website.path,object`, to take
[one terrifying example](http://blog.dowski.com/2008/07/31/customizing-the-python-import-system/).
That's what you do with a MetaFinder: A MetaFinder may choose to ignore
the entire sys.path mechanism and do something that has nothing to do
with the filesystem, or it may have its own take on what to do with
`sys.path`.
A Finder is any object with the following method:
```
[Loader|None] find_module([self|cls], fullname:string, path:[string|None])
```
The find_module method returns None if it cannot find a loader resource
for fullname & path.
A MetaFinder is placed into the list `sys.meta_path` by whatever code
needs the MetaFinder, and it persists for the duration of the runtime,
unless it is later removed or replaced. Being a list, the search is
ordered; first match wins. MetaFinders may be instantiated in any way
the developer desires before being added into `sys.meta_path`.
### PathHooks and PathFinders
*PathHooks* are how `sys.path` is scanned to determine the which Finder
should be associated with a given directory path.
A PathHook is a function (or callable):
```
[Finder|None] <anonymous function>(path:string)
```
A *PathHook* takes a given directory path and, if the PathHook can
identify a corresponding FileFinder for the modules in that directory
path and return a constructed instance of that FileFinder, otherwise it
returns None.
If no `sys.meta_path` finder returns a loader, the full array of
`sys.paths sys.path_hooks` is compared until a PathHook says it can
handle the path _and_ the corresponding finder says it can handle the
fullname. If no match happens, Python's default FileFinder class is
instantiated with the path.
This means that for each path in `sys.paths`, the list of
`sys.path_hooks` is scanned; the first function to return an importer is
handed responsibility for that path; if no function returns, the default
FileFinder is returned; the default FileFinder returns only the default
SourceFileLoader which (if you read to the end of
[part one](http://elfsternberg.com)) blocks our path toward
heterogeneous packages.
PathHooks are placed into the list `sys.path_hooks`; like
`sys.meta_path`, the list is ordered and first one wins.
### The Takeaway
There's some confusion over the difference between the two objects, so
let's clarify one last time.
<em class="pointer"></em> Use a **meta_finder** (A Finder in
`sys.meta_path`) when you want to redefine the meaning of the import
string so it can search alternative paths that may have no reference to
a filesystem path found in `sys.path`; an import string could be
redefined as a location in an archive, an RDF triple of
document/tag/content, or table/row_id/cell, or be interpreted as a URL
to a remote resource.
<em class="pointer"></em> Use a **path_hook** (A function in
`sys.path_hooks` that returns a FileFinder) when you want to
re-interpret the meaning of an import string that refers to a module
object on or accessible by `sys.path`; PathHooks are important when you
want to add directories to sys.path that contain something _other than_
`.py`, `.pyc/.pyo`, and `.so` modules conforming to the Python ABI.
<em class="pointer"></em> A *MetaFinder* is typically constructed when
it is added to `sys.meta_path`; a *PathHook* instantiates a *FileFinder*
when the PathHook function lays claim to the path. The developer
instantiates a MetaFinder before adding it to `sys.meta_path`; it's the
PathHook function that instantiates a FileFinder.
## Next
Note that PathHooks are for paths containing something _other than_ the
traditional (and hard-coded) source file extensions. The purpose of a
heterogeneous source file finder and loader is to enable finding in
directories within `sys.path` that contain other source files syntaxes
_alongside_ those traditional sources. I need to *eclipse* (that is,
get in front of) the default FileFinder with one that understands more
suffixes than those listed in either `imp.get_suffixes()` (Python 2) or
`importlib._bootstrap.SOURCE_SUFFIXES` (Python 3). I need one that will
return the Python default loader if it encounters the Python default
suffixes, but will invoke *our own* source file loader when encountering
one of our suffixes.
We'll talk about loading next.