155 lines
6.8 KiB
Markdown
155 lines
6.8 KiB
Markdown
|
In the last post, I introduced the concepts of the **module object**,
|
|||
|
**module**, and **package**, concrete objects that exist within the
|
|||
|
Python runtime, as well as some basic ideas about packaging, finding,
|
|||
|
and loading.
|
|||
|
|
|||
|
In this post, I'll go over the process of *finding*, what it means to
|
|||
|
*find* something, and what happens next.
|
|||
|
|
|||
|
## A Clarifying point
|
|||
|
|
|||
|
I've been very careful to talk about *finding* vs. *loading*
|
|||
|
vs. *listing* in this series of posts. There's a reason for that: in
|
|||
|
Python 2, the terms "Finder" and "Importer" were used interchangeably,
|
|||
|
leading to (at least on my part) massive confusion. In actual fact,
|
|||
|
finders, hooks, loaders, and listers are all individual objects, each
|
|||
|
with a single, unique method with a specific signature. The method name
|
|||
|
is different for each stage, so it is theoretically possible to define a
|
|||
|
single class that does all three for a given category of *module
|
|||
|
object*, and only in that case, I believe, should we talk about an
|
|||
|
"Importer."
|
|||
|
|
|||
|
In Python 2.6 and 2.7, the definitive Finder class is called
|
|||
|
`pkgutil.ImpImporter`, and the Loader is called `pkgutil.ImpLoader`;
|
|||
|
this was a source of much of my confusion. In Python 3, the term
|
|||
|
"Importer" is deprecated and "Finder" is used throughout `importlib`. I
|
|||
|
will be using "Finder" from now on.
|
|||
|
|
|||
|
## Finding
|
|||
|
|
|||
|
When the 'import <fullname>' command is called, a procedure is
|
|||
|
triggered. That procedure then:
|
|||
|
|
|||
|
* attempts to *find* a corresponding python *module*
|
|||
|
* attempts to *load* that corresponding module into *bytecode*
|
|||
|
* Associates the bytecode with the name via sys.modules[fullname]
|
|||
|
* Exposes the bytecode to the calling scope.
|
|||
|
* Optionally: writes the bytecode to the filesystem for future use
|
|||
|
|
|||
|
*Finding* is the act of identifying a resource that corresponds to the
|
|||
|
import string that can be compiled into a meaningful Python module. The
|
|||
|
import string is typically called the *fullname*.
|
|||
|
|
|||
|
*Finding* typically involves scanning a collection of *resources*
|
|||
|
against a collection of *finders*. *Finding* ends when *finder `A`*,
|
|||
|
given *fullname `B`*, reports that a corresponding module can be found
|
|||
|
in *resource `C`*, and that the resource can be loaded with *loader
|
|||
|
`D`*."
|
|||
|
|
|||
|
### MetaFinders
|
|||
|
|
|||
|
*Finders* come first, and *MetaFinders* come before all other kinds of
|
|||
|
finders.
|
|||
|
|
|||
|
_Most finding is done in the context of `sys.path`_; that is, Python's
|
|||
|
primary means of organizing Python modules is to have them somewhere on
|
|||
|
the local filesystem. This makes sense. Sometimes, however, you want
|
|||
|
to get in front of that scan and impose your own logic: you want the
|
|||
|
root of an import string to mean something else. Maybe instead of
|
|||
|
`directory.file`, you want it to mean `table.row.cell`, or you want it
|
|||
|
to mean `website.path,object`, to take
|
|||
|
[one terrifying example](http://blog.dowski.com/2008/07/31/customizing-the-python-import-system/).
|
|||
|
|
|||
|
That's what you do with a MetaFinder: A MetaFinder may choose to ignore
|
|||
|
the entire sys.path mechanism and do something that has nothing to do
|
|||
|
with the filesystem, or it may have its own take on what to do with
|
|||
|
`sys.path`.
|
|||
|
|
|||
|
A Finder is any object with the following method:
|
|||
|
```
|
|||
|
[Loader|None] find_module([self|cls], fullname:string, path:[string|None])
|
|||
|
```
|
|||
|
|
|||
|
The find_module method returns None if it cannot find a loader resource
|
|||
|
for fullname & path.
|
|||
|
|
|||
|
A MetaFinder is placed into the list `sys.meta_path` by whatever code
|
|||
|
needs the MetaFinder, and it persists for the duration of the runtime,
|
|||
|
unless it is later removed or replaced. Being a list, the search is
|
|||
|
ordered; first match wins. MetaFinders may be instantiated in any way
|
|||
|
the developer desires before being added into `sys.meta_path`.
|
|||
|
|
|||
|
### PathHooks and PathFinders
|
|||
|
|
|||
|
*PathHooks* are how `sys.path` is scanned to determine the which Finder
|
|||
|
should be associated with a given directory path.
|
|||
|
|
|||
|
A PathHook is a function (or callable):
|
|||
|
```
|
|||
|
[Finder|None] <anonymous function>(path:string)
|
|||
|
```
|
|||
|
|
|||
|
A *PathHook* takes a given directory path and, if the PathHook can
|
|||
|
identify a corresponding FileFinder for the modules in that directory
|
|||
|
path and return a constructed instance of that FileFinder, otherwise it
|
|||
|
returns None.
|
|||
|
|
|||
|
If no `sys.meta_path` finder returns a loader, the full array of
|
|||
|
`sys.paths ⨯ sys.path_hooks` is compared until a PathHook says it can
|
|||
|
handle the path _and_ the corresponding finder says it can handle the
|
|||
|
fullname. If no match happens, Python's default FileFinder class is
|
|||
|
instantiated with the path.
|
|||
|
|
|||
|
This means that for each path in `sys.paths`, the list of
|
|||
|
`sys.path_hooks` is scanned; the first function to return an importer is
|
|||
|
handed responsibility for that path; if no function returns, the default
|
|||
|
FileFinder is returned; the default FileFinder returns only the default
|
|||
|
SourceFileLoader which (if you read to the end of
|
|||
|
[part one](http://elfsternberg.com)) blocks our path toward
|
|||
|
heterogeneous packages.
|
|||
|
|
|||
|
PathHooks are placed into the list `sys.path_hooks`; like
|
|||
|
`sys.meta_path`, the list is ordered and first one wins.
|
|||
|
|
|||
|
### The Takeaway
|
|||
|
|
|||
|
There's some confusion over the difference between the two objects, so
|
|||
|
let's clarify one last time.
|
|||
|
|
|||
|
<em class="pointer">☞</em> Use a **meta_finder** (A Finder in
|
|||
|
`sys.meta_path`) when you want to redefine the meaning of the import
|
|||
|
string so it can search alternative paths that may have no reference to
|
|||
|
a filesystem path found in `sys.path`; an import string could be
|
|||
|
redefined as a location in an archive, an RDF triple of
|
|||
|
document/tag/content, or table/row_id/cell, or be interpreted as a URL
|
|||
|
to a remote resource.
|
|||
|
|
|||
|
<em class="pointer">☞</em> Use a **path_hook** (A function in
|
|||
|
`sys.path_hooks` that returns a FileFinder) when you want to
|
|||
|
re-interpret the meaning of an import string that refers to a module
|
|||
|
object on or accessible by `sys.path`; PathHooks are important when you
|
|||
|
want to add directories to sys.path that contain something _other than_
|
|||
|
`.py`, `.pyc/.pyo`, and `.so` modules conforming to the Python ABI.
|
|||
|
|
|||
|
<em class="pointer">☝</em> A *MetaFinder* is typically constructed when
|
|||
|
it is added to `sys.meta_path`; a *PathHook* instantiates a *FileFinder*
|
|||
|
when the PathHook function lays claim to the path. The developer
|
|||
|
instantiates a MetaFinder before adding it to `sys.meta_path`; it's the
|
|||
|
PathHook function that instantiates a FileFinder.
|
|||
|
|
|||
|
## Next
|
|||
|
|
|||
|
Note that PathHooks are for paths containing something _other than_ the
|
|||
|
traditional (and hard-coded) source file extensions. The purpose of a
|
|||
|
heterogeneous source file finder and loader is to enable finding in
|
|||
|
directories within `sys.path` that contain other source files syntaxes
|
|||
|
_alongside_ those traditional sources. I need to *eclipse* (that is,
|
|||
|
get in front of) the default FileFinder with one that understands more
|
|||
|
suffixes than those listed in either `imp.get_suffixes()` (Python 2) or
|
|||
|
`importlib._bootstrap.SOURCE_SUFFIXES` (Python 3). I need one that will
|
|||
|
return the Python default loader if it encounters the Python default
|
|||
|
suffixes, but will invoke *our own* source file loader when encountering
|
|||
|
one of our suffixes.
|
|||
|
|
|||
|
We'll talk about loading next.
|