Man page ======== Description +++++++++++ ``exekall`` is a python-based test runner. The expressions it executes are discovered from Python PEP 484 parameter and return value annotations. Options +++++++ exekall ------- .. run-command:: :ignore-error: :literal: exekall --help exekall run ----------- .. run-command:: :ignore-error: :literal: # Give the python module to exekall to get the LISA options in addition to # the generic ones. exekall run lisa --help exekall compare --------------- .. run-command:: :ignore-error: :literal: exekall compare --help exekall show ------------ .. run-command:: :ignore-error: :literal: exekall show --help exekall merge ------------- .. run-command:: :ignore-error: :literal: exekall merge --help Executing expressions +++++++++++++++++++++ Expressions are built by scanning the python source code passed to ``exekall run``. Selecting which expression to execute using ``exekall run`` can be achieved in several ways: * ``--select``/``-s`` with a pattern matching an expression ID. Pattern prefixed with **!** can be used to exclude some expressions. * Pointing ``exekall run`` at a subset of python source files, or to module names. Only files (directly or indirectly) imported from these python modules will be scanned for callables. Once the expressions are selected, multiple iterations of it can be executed using ``-n``. ``--share TYPE_PATTERN`` can be used to share part of the expression graph between all iterations, to avoid re-executing some parts of the expression. Be aware that all parameters of what is shared will also be shared implicitly to keep consistent expressions. The adaptor found in the customization module of the python sources you are using can add extra options to ``exekall run``, which are shown in ``--help`` only when these sources are specified as well. Expression engine +++++++++++++++++ At the core of ``exekall`` is the expression engine. It is in charge of building sensible sequences of calls out of python-level annotations (see PEP 484), and then executing them. An expression is a graph where each node has named *parameters* that point to other nodes. .. _exekall-expression-id: Expression ID ------------- Each expression has an associated ID that is derived from its structure. The rules are: 1. The ID of the first parameter of a given node is prepended to the ID of the node, separated with **:**. The code :code:`f(g())` has the ID ``g:f``. 2. The ID of the node is composed of the name of the operator of that node (name of a Python callable), followed by a parenthesis-enclosed list of parameters ID, excluding the first parameter. The code :code:`f(p1=g(), p2=h(k()))` has the ID ``g:f(p2=k:h)``. 3. Expression values can have named tags attached to them. When displaying the ID of such a value, the tag would be inserted right after the operator name, inside brackets. The value returned by ``g`` tagged with a tag named ``mytag`` with value ``42`` would give: ``g[mytag=42]:f(p2=k:h)``. Note that tags are only relevant when using expression values, since the tags are attached to values, not operators. The first rule allows seamless composition of simple pipeline stages and is especially suited to object oriented programming, since the first parameter of methods will be ``self``. Tags can be used to add attach some important metadata to the return value of an operator, so it can be easily distinguished when taken out of context. Sharing subexpressions ---------------------- When multiple expressions are to be executed, ``exekall`` will eliminate common subexpressions. That will apply both inside an expression and between different expressions. That avoids re-executing the same operator multiple times if it can be reused and if it would have been called with the same parameters. That also ensures that referring to a given type for a parameter will give back the same object within any given expression. Executing the IDs ``g:f(p2=g)`` and ``g:h`` will translate to an expression graph equivalent to:: x = g() res1 = f(x, p2=x) res2 = h(x) The expression execution engine logs when a given value is computed or reused. Execution --------- Executing an expression means evaluating each node if it has not already been evaluated. If an operator is not reusable, it will always be called when a value is requested from it, even if some existing values computed with the same parameters exist. By default, all operators are reusable, but some types can be flagged as non-reusable by the customization module (see :ref:`customize`). Operators are allowed to be generator functions as well. In that case, the engine will iterate over the generator, and will execute the downstream expressions for each value it provides. Multiple generator functions can be chained, leading to a cascade of values for the same expression. Once an expression has been executed, all its values will get a UUID that can be used to uniquely refer to it, and track where it was used in the logs. Exploiting artifacts ++++++++++++++++++++ ``exekall run`` produces an artifact folder. The location can be set using ``--artifact-dir`` and other options. Folder hierarchy ---------------- The artifact folder contains the following files: * **INFO.log** and **DEBUG.log** contain logs for info and debug levels of the ``logging`` standard module. Note that standard output is not included in this log, as it does not go through the ``logging`` module * **VALUE_DB.pickle.xz** contains a serialized objects graph for each expression that was executed. The value of each subexpression is included if the object was serializable. * **BY_UUID** contains symlinks named after UUIDs, and pointing to a relevant subfolder in the artifacts. That allows quick lookup of the artifacts of a given expression if one has its UUID. * A folder for each expression. * Optionally, an **ORIGIN** folder if the artifact folder is the result of **exekall merge**, or **exekall run --load-db**. It contains the hierarchy of each original artifact folder by using folders and symlinks pointing inside the artifact folder. Inside each expression's folder, there is a folder with the UUID of the expression itself. Having that level allows merging artifact folders together and avoids conflict in case two different expressions share the same ID. Inside that folder, the following files can be found: * **STRUCTURE** which contains the structure of the expression. Each operator is described by its callable name, its return type, and its parameters. Parameters are recursively defined the same way. An **svg** or **.dot** (graphviz) variant may exist as well. * **EXPRESSION.py** and **TEMPLATE_EXPRESSION.py** files are executable Python script that are equivalent to what was executed by ``exekall run``. The template one is created before execution and contains some placeholders for the sparks. The other one is updated after execution to add commented code that reloads any given value from the database. That gives the option to the user to not re-execute some part of the code, but load a serialized value instead. * Artifact folders allocated by some operators. exekall compare --------------- **VALUE_DB.pickle.xz** can be compared using ``exekall compare``. This will call the comparison method of the adaptor that was used when ``exekall run`` was executed. That function is expected to compare the expression values found in the databases, by matching values that have the same ID on both databases. Adding new expressions ++++++++++++++++++++++ Since ``exekall run`` will discover expressions based on type annotations of callable parameters and return value, all that is needed to extend an existing package is to write new callables with such annotations. It is possible to use a base class in an annotation, in which case the engine will be free to pick all the subclasses it can, and produce an expression with each. A dummy example would be:: import abc class BaseConf(abc.ABC): @abc.abstractmethod def get_conf(self): pass class Conf(BaseConf): # By default, callables with an empty parameter list are ignored. They # can be explicitly be used with "exekall run --allow '*Conf'" def __init__(self): self.x = 42 def get_conf(self): return x class Stage1: # exekall recognizes classes as a special case: the parameter annotations # are taken from __init__ and the return type is the class def __init__(self, conf:BaseConf): print("building stage1") self.conf = conf # first parameter of methods is automatically annotated with the right # class. # "forward-references are possible by using a string to annotate. def process_method(self) -> 'Stage2': return Stage2(x.conf.x == 42) class Stage2: def __init__(self, passed): self.passed = passed def process1(x:Stage1) -> Stage2: return Stage2(x.conf.x == 42) def process2(x:Stage1, conf:BaseConf, has_default_val=33) -> Stage2: return Stage2(x.conf.x == 0) From that, ``exekall run --allow '*Conf' --goal '*Stage2'`` would infer the expressions ``Conf:Stage1:process_method``, ``Conf:Stage1:process1`` and ``Conf:Stage1:process2(conf=Conf)``. The common subexpression ``Conf:Stage1`` would be shared between these two by default. Callables are assumed to not be polymorphic in their return value, as the methods that will be called on the resulting value is decided ahead of time. A limited form of polymorphism akin to Rust's Generic Associated Types (GATs) or Haskell's associated type families is supported:: import typing class Base: ASSOCIATED_CLS = typing.TypeVar('ASSOCIATED_CLS') # Methods are allowed to use this polymorphic type as a return type, as # long as all subclasses override ASSOCIATED_CLS class attribute. def foo(self) -> 'Base.ASSOCIATED_CLS': return X class Derived1(Base): X = 1 ASSOCIATED_CLS = type(X) class Derived2(Base): X = 'hello' ASSOCIATED_CLS = type(X) If a parameter has a default value, its annotation can be omitted. If a parameter has both a default value and an annotation, ``exekall`` will try to provide a value for it, or use the default value if no subexpression has the right type. When an expression is not detected correctly, ``--verbose``/``-v`` can be used and repeated twice to get more information on what callables are being ignored and why. Most common issues are: * Partial annotations: all parameters and return values need to be either annotated or have a default value. * Abstract Base Classes (see :class:`abc.ABC`) with missing implementation of some attributes. * Cycles in the expression graphs. Considering types as pipeline stages helps avoiding cycles in expression graphs when architecturing a module. Not all classes need to be considered as such, only the ones that will be used in annotations. * Missing "spark", i.e. operator that can provide values without any parameter. The adaptor in the customization module usually takes care of doing that based on domain-specific command line options, but some ignored callables may be forcefully selected using ``--allow`` if needed. * Missing ``import`` chain from the sources given to ``exekall run`` to the module that defines the callable that is expected to be used. That can be solved by adding more ``import`` statements, or simply giving that source file directly to ``exekall run``. * Wrong goal selected using ``--goal``. .. _customize: Customizing exekall +++++++++++++++++++ The behavior of ``exekall`` can be customized by subclassing :class:`exekall.customization.AdaptorBase` in a module that must be called ``exekall_customization.py`` and located in one of the parent packages of the modules that are explicitly passed to ``exekall run``. This allows adding extra options to ``exekall run`` and ``compare``, tag values in IDs, change the set of callables that will be hidden from the ID and define what type is considered to provide reusable values by the engine among other things.