viewfinder.tex 7.12 KB
 Dennis Müller committed Apr 20, 2018 1 \subsection{The General Algorithm} The aim is to find typing-preserving morphisms between theories, i.e. given a constant $C:t$ in a theory $T_1$, we want to find a view $V:T_1\to T_2$ such that if $V(C)=C'$ and $C':t'\in T_2$, we have $V(t)=t'$. Correspondingly, we need to consider the types of the constants in our two theories, which we assume to be flat.  Dennis Müller committed Apr 18, 2018 2 3 4  To run a proper unficiation algorithm is in our situation infeasible, since the flat version of a theory can become prohibitively large (and obviously finding two unifiable types in two theories is a search problem quadratic in the number of constants). To solve that problem, we first preprocess our theories such that pre-selecting plausibly unifiable'' constants becomes as fast as possible.  Dennis Müller committed Apr 20, 2018 5 6 \paragraph{} We do so by first transforming each constant $C$ in a theory to an \emph{encoding} $(\cn{h}(C),\cn{syms}(C))$ the following way:  Dennis Müller committed Apr 23, 2018 7 Consider the syntax tree of the type $t$ of a constant $C$. We first systematically replace the leaves by an abstract representation, yielding a data structure $\cn{tree}(C)$. We can eliminate variables by replacing them by their De Bruijn index, and symbol references by enumerating them and storing the symbol's names in a list $\cn{syms}(C)$.  Dennis Müller committed Apr 18, 2018 8 9 10 11 12  As a result, we get a pair $(\cn{tree}(C),\cn{syms}(C))$. Now an assignment $V(C)=D$ is valid, iff $\cn{tree}(C)=\cn{tree}(D)$, the lists $\cn{syms}(C)$ and $\cn{syms}(D)$ have the same length and their pairwise assignments $V(\cn{syms}(C)_i)=\cn{syms}(D)_i$ are all valid. Furthermore, since we only need $\cn{tree}(C)$ for an equality check, we can immediately replace $\cn{tree}(C)$ by an integer hashcode $\cn{h}(C)$.  Dennis Müller committed Apr 20, 2018 13 14 15 16 17 18 19 20 21 22 Now given a constant $C\in T_1$, we can find valid matches in $T_2$ by computing the encodings for each constant in $T_1$ and $T_2$, taking the list of constants $C'\in T_2$ with $\cn{h}(C')=\cn{h}(C)$ and recursing into their respective lists of symbols. \subsection{Improvements and Parameters of the Algorithm} \paragraph{Extending the abstract syntax tree} By the very nature of the approach described in Section \ref{sec:oaf}, many symbols will be common to domain and codomain of a given viewfinding problem: Since all of the libraries available are ultimately based on (extensions of) LF, we never want to reassign the symbols provided by the LF theory. Furthermore, if we want to find views between theories that share a common meta-theory (e.g. theories within the same library), we most certainly do not want to reassign the symbols in that meta-theory. In these situations, we can consider the \emph{common} symbols as a fixed part of the abstract syntax tree of a constant. The symbols will hence be encoded in the component $\cn{h}(C)$ instead of the list $\cn{syms}(C)$. This will not only exclude spurious matches, but also reduce the number of plausible matches and consequently speed up the algorithm. \paragraph{Picking starting pairs:} Note, that we will still find many spurious matches if executed in its general form. The reason being that (for example) all atomic types match each other, as will all binary operators etc. Most of these results will not be interesting. Furthermore, since the algorithm needs to recurse into the lists $\cn{syms}(C)$, many potential matches will need to be checked repeatedly. Both problems can be massively reduced by selecting specific pairs of encodings as \emph{starting pairs} for the algorithm, so that the majority of matching constants will only be considered if the algorithm runs into them during recursing. Potential useful starting points are: \begin{itemize}  Dennis Müller committed Apr 20, 2018 23 24  \item \emph{Axioms:} Since we are mostly interested in matching constants that share the same mathematical properties, by using axioms as starting point we can ensure that the algorithm only matches constants that have at least one (axiomatic) property in common (e.g. only commutative, or associative operators). \item \emph{The length of $\cn{syms}(C)$:} This way we can guarantee, that only matches will be produced that relate at least two (or more) constant to each other -- if e.g. we are interested in structures that have a distributive property, this will make sure that we only find those views, where both operators are matched as intended.  Dennis Müller committed Apr 20, 2018 25 26 27 28 29 30 31 32 \end{itemize} \paragraph{Picking starting theories:} If we try to find views between whole collections of theories, we can obviously disregard all theories that are included in some other theory in our collections, since we need to work with the flat theories anyway. Consequently, by only using maximal theories we do not find any fewer views but speed up the search significantly. \paragraph{Preprocessing and Translations} Especially when looking for views between theories in different libraries (and built on different foundations), various differences in available foundational primitives and library-specific best practices and idiosyncracies can prevent the algorithm from finding desired matches. There are two approaches to increasing the number of results in these cases: \begin{itemize} \item We can preprocess theories, by e.g. normalizing expressions, eliminating higher-order abstract syntax encodings or encoding-related redundant information (such as the type of a typed equality, which in the presence of subtyping can be different from the types of both sides of an equation), or elaborating abbreviations/definitions.  Dennis Müller committed Apr 20, 2018 33  \item In many instances, the translations between two foundations is too complex to be discovered purely syntactically. In these cases we can provide arbitrary translations between theories, which are applied before computing the encoding.\ednote{Mention/cite alignment-translation paper}  Dennis Müller committed Apr 20, 2018 34 \end{itemize}  Dennis Müller committed Apr 20, 2018 35 When elaborating definitions, it is important to consider that this may also reduce the number of results, if both theories use similar abbreviations for complex terms, or the same concept is declared axiomatically in one theory, but definitionally in the other. For that reason, we can allow several encodings for the same constant during preprocessing, such as one with definitions expanded and one as is''.  Dennis Müller committed Apr 18, 2018 36   Dennis Müller committed Apr 20, 2018 37 38 Similarly, certain idiosyncracies -- such as PVS's common usage of theory parameters -- call for not just matching symbol references, but also variables or possibly even complex expressions. To handle these situations, we additionally allow for holes'' in the lists $\cn{syms}(C)$, which may be unified with any other symbol or hole, but are not recursed into. The subterms that are to be considered holes can be marked as such during preprocessing.  Dennis Müller committed Apr 20, 2018 39 \paragraph{Storing Encodings} Finally, for computing the encodings of a theory we only need to know the symbols to be fixed in the component $\cn{h}(C)$ of an encoding, for which only the meta-theories of the theories are relevant. They also determine the specific preprocessings and translations we want to likely use. Since there is only a small number of meta-theories incolved that are relevant in practice, we can store and retrieve the encodings for the most important situations. Since computing the encodings (as well as sorting the theories in a library by their dependencies) is the most expensive part of the algorithm, this -- once computed and stored -- makes the viewfinding process itself rather efficent.