6.8 KiB
In the last post, I introduced the concepts of the module object, module, and package, concrete objects that exist within the Python runtime, as well as some basic ideas about packaging, finding, and loading.
In this post, I'll go over the process of finding, what it means to find something, and what happens next.
A Clarifying point
I've been very careful to talk about finding vs. loading vs. listing in this series of posts. There's a reason for that: in Python 2, the terms "Finder" and "Importer" were used interchangeably, leading to (at least on my part) massive confusion. In actual fact, finders, hooks, loaders, and listers are all individual objects, each with a single, unique method with a specific signature. The method name is different for each stage, so it is theoretically possible to define a single class that does all three for a given category of module object, and only in that case, I believe, should we talk about an "Importer."
In Python 2.6 and 2.7, the definitive Finder class is called
pkgutil.ImpImporter
, and the Loader is called pkgutil.ImpLoader
;
this was a source of much of my confusion. In Python 3, the term
"Importer" is deprecated and "Finder" is used throughout importlib
. I
will be using "Finder" from now on.
Finding
When the 'import ' command is called, a procedure is triggered. That procedure then:
- attempts to find a corresponding python module
- attempts to load that corresponding module into bytecode
- Associates the bytecode with the name via sys.modules[fullname]
- Exposes the bytecode to the calling scope.
- Optionally: writes the bytecode to the filesystem for future use
Finding is the act of identifying a resource that corresponds to the import string that can be compiled into a meaningful Python module. The import string is typically called the fullname.
Finding typically involves scanning a collection of resources
against a collection of finders. Finding ends when finder A
,
given fullname B
, reports that a corresponding module can be found
in resource C
, and that the resource can be loaded with loader
D
."
MetaFinders
Finders come first, and MetaFinders come before all other kinds of finders.
Most finding is done in the context of sys.path
; that is, Python's
primary means of organizing Python modules is to have them somewhere on
the local filesystem. This makes sense. Sometimes, however, you want
to get in front of that scan and impose your own logic: you want the
root of an import string to mean something else. Maybe instead of
directory.file
, you want it to mean table.row.cell
, or you want it
to mean website.path,object
, to take
one terrifying example.
That's what you do with a MetaFinder: A MetaFinder may choose to ignore
the entire sys.path mechanism and do something that has nothing to do
with the filesystem, or it may have its own take on what to do with
sys.path
.
A Finder is any object with the following method:
[Loader|None] find_module([self|cls], fullname:string, path:[string|None])
The find_module method returns None if it cannot find a loader resource for fullname & path.
A MetaFinder is placed into the list sys.meta_path
by whatever code
needs the MetaFinder, and it persists for the duration of the runtime,
unless it is later removed or replaced. Being a list, the search is
ordered; first match wins. MetaFinders may be instantiated in any way
the developer desires before being added into sys.meta_path
.
PathHooks and PathFinders
PathHooks are how sys.path
is scanned to determine the which Finder
should be associated with a given directory path.
A PathHook is a function (or callable):
[Finder|None] <anonymous function>(path:string)
A PathHook takes a given directory path and, if the PathHook can identify a corresponding FileFinder for the modules in that directory path and return a constructed instance of that FileFinder, otherwise it returns None.
If no sys.meta_path
finder returns a loader, the full array of
sys.paths ⨯ sys.path_hooks
is compared until a PathHook says it can
handle the path and the corresponding finder says it can handle the
fullname. If no match happens, Python's default FileFinder class is
instantiated with the path.
This means that for each path in sys.paths
, the list of
sys.path_hooks
is scanned; the first function to return an importer is
handed responsibility for that path; if no function returns, the default
FileFinder is returned; the default FileFinder returns only the default
SourceFileLoader which (if you read to the end of
part one) blocks our path toward
heterogeneous packages.
PathHooks are placed into the list sys.path_hooks
; like
sys.meta_path
, the list is ordered and first one wins.
The Takeaway
There's some confusion over the difference between the two objects, so let's clarify one last time.
☞ Use a meta_finder (A Finder in
sys.meta_path
) when you want to redefine the meaning of the import
string so it can search alternative paths that may have no reference to
a filesystem path found in sys.path
; an import string could be
redefined as a location in an archive, an RDF triple of
document/tag/content, or table/row_id/cell, or be interpreted as a URL
to a remote resource.
☞ Use a path_hook (A function in
sys.path_hooks
that returns a FileFinder) when you want to
re-interpret the meaning of an import string that refers to a module
object on or accessible by sys.path
; PathHooks are important when you
want to add directories to sys.path that contain something other than
.py
, .pyc/.pyo
, and .so
modules conforming to the Python ABI.
☝ A MetaFinder is typically constructed when
it is added to sys.meta_path
; a PathHook instantiates a FileFinder
when the PathHook function lays claim to the path. The developer
instantiates a MetaFinder before adding it to sys.meta_path
; it's the
PathHook function that instantiates a FileFinder.
Next
Note that PathHooks are for paths containing something other than the
traditional (and hard-coded) source file extensions. The purpose of a
heterogeneous source file finder and loader is to enable finding in
directories within sys.path
that contain other source files syntaxes
alongside those traditional sources. I need to eclipse (that is,
get in front of) the default FileFinder with one that understands more
suffixes than those listed in either imp.get_suffixes()
(Python 2) or
importlib._bootstrap.SOURCE_SUFFIXES
(Python 3). I need one that will
return the Python default loader if it encounters the Python default
suffixes, but will invoke our own source file loader when encountering
one of our suffixes.
We'll talk about loading next.