@@ -2,26 +2,26 @@ We have seen how a view finder can be used for theory \emph{discovery} and findi
The theory discovery use case described in Sec. \ref{sec:usecase} is mostly desirable in a setting where a user is actively writing or editing a theory, so the integration in jEdit is sensible. However, the across-library use case in Sec. \ref{sec:pvs} already would be a lot more useful in a theory exploration setting, such as when browsing available archives on MathHub~\cite{mathhub} or in the graph viewer integrated in \mmt ~\cite{RupKohMue:fitgv17}. Additional specialized user interfaces would enable or improve the following use cases:
\begin{itemize}
\item\textbf{Model-/Countermodel Finding:} If the codomain of a morphism is a theory representing a specific model, it would tell her that those are \emph{examples} of her abstract theory.
\item\textbf{Model-/Countermodel Finding:} If the codomain of a view is a theory representing a specific model, it would tell her that those are \emph{examples} of her abstract theory.
Furthermore, partial morphisms -- especially those that are total on some included theory -- could be insightful \emph{counterexamples}.
Furthermore, partial views -- especially those that are total on some included theory -- could be insightful \emph{counterexamples}.
\item\textbf{Library Refactoring:} Given that the view finder looks for \emph{partial}morphisms, we can use it to find natural extensions of a starting theory. Imagine Jane removing the last of her axioms for ``beautiful sets'' -- the other axioms (disregarding finitude of her sets) would allow her to find e.g. both Matroids and \emph{Ideals}, which would suggest to her to refactor her library accordingly.
\item\textbf{Library Refactoring:} Given that the view finder looks for \emph{partial}views, we can use it to find natural extensions of a starting theory. Imagine Jane removing the last of her axioms for ``beautiful sets'' -- the other axioms (disregarding finitude of her sets) would allow her to find e.g. both Matroids and \emph{Ideals}, which would suggest to her to refactor her library accordingly.
Additionally, \emph{surjective} partial morphisms would inform her, that her theory would probably better be refactored as an extension of the codomain, which would allow her to use all theorems and definitions therein.
Additionally, \emph{surjective} partial views would inform her, that her theory would probably better be refactored as an extension of the codomain, which would allow her to use all theorems and definitions therein.
\item\textbf{Theory Generalization:} If we additionally consider morphisms into and out of the theories found, this can make theory discovery even more attractive. For example, a morphism from a theory of vector spaces intro matroids could inform Jane additionally, that her beautiful sets, being matroids, form a generalization of the notion of linear independence in linear algebra.
\item\textbf{Theory Generalization:} If we additionally consider views into and out of the theories found, this can make theory discovery even more attractive. For example, a view from a theory of vector spaces intro matroids could inform Jane additionally, that her beautiful sets, being matroids, form a generalization of the notion of linear independence in linear algebra.
\item\textbf{Folklore-based Conjecture:} If we were to keep book on our transfomations during preprocessing and normalization, we could use the found morphisms for translating both into the codomain as well as back from there into our starting theory.
\item\textbf{Folklore-based Conjecture:} If we were to keep book on our transfomations during preprocessing and normalization, we could use the found views for translating both into the codomain as well as back from there into our starting theory.
This would allow for e.g. discovering and importing theorems and useful definitions from some other library -- which on the basis of our encodings can be done directly by the view finder.
A useful interface might specifically prioritize morphisms into theories on top of which there are many theorems and definitions that have been discovered.
A useful interface might specifically prioritize views into theories on top of which there are many theorems and definitions that have been discovered.
\end{itemize}
For some of these use cases it would be advantageous to look for morphisms \emph{into} our working theory instead.
For some of these use cases it would be advantageous to look for views \emph{into} our working theory instead.
Note that even though the algorithm is in principle symmetric, some aspects often depend on the direction -- e.g. how we preprocess the theories, which constants we use as starting points or how we aggregate and evaluate the resulting (partial) morphisms (see Sections \ref{sec:algparams}, \ref{sec:normalizeintra} and \ref{sec:normalizeinter}).
Note that even though the algorithm is in principle symmetric, some aspects often depend on the direction -- e.g. how we preprocess the theories, which constants we use as starting points or how we aggregate and evaluate the resulting (partial) views (see Sections \ref{sec:algparams}, \ref{sec:normalizeintra} and \ref{sec:normalizeinter}).
We present a general MKM utility that given a \MMT theory and an \MMT library $\cL$ finds
partial and total morphisms into $\cL$.
partial and total views into $\cL$.
Such a view finder can be used to drive various MKM applications ranging from theory classification to library merging and refactoring.
We have presented the first and last of these and show that they are feasible. For the applications discussed but unrealized in this paper, we mainly need to determine the right application context and user interface.
\paragraph{Future Work}
The current view finder is already efficient enough for the limited libraries we used for testing.
To increase efficiency, we plan to explore term indexing techniques~\cite{Graf:ti96} that support $1:n$ and even $n:m$ matching and unification queries.
The latter will be important for the library refactoring and merging applications which look for all possible (partial and total) morphisms in one or between two libraries.
The latter will be important for the library refactoring and merging applications which look for all possible (partial and total) views in one or between two libraries.
As such library-scale operations will have to be run together with theory flattening to a fixed point and re-run upon every addition to the library, it will be important to integrate them with the \MMT build system and change management processes~\cite{am:doceng10,iancu:msc}.
@@ -62,31 +62,31 @@ between \MMT theories. $\cL$: given a query theory $Q$, the view finder compute
and the assignments made by the views.
\paragraph{Related Work}
Existing systems have so far only worked with explicitly given theory morphisms, e.g., in IMPS \cite{imps} or Isabelle \cite{isabelle}.
Automatically and systematically searching for new theory morphisms was first undertaken in \cite{NorKoh:efnrsmk07} in 2006.
Existing systems have so far only worked with explicitly given views, e.g., in IMPS \cite{imps} or Isabelle \cite{isabelle}.
Automatically and systematically searching for new views was first undertaken in \cite{NorKoh:efnrsmk07} in 2006.
However, at that time no large corpora of formalized mathematics were available in standardized formats that would have allowed easily testing the ideas in large corpora.
This situation has changed since then as multiple such exports have become available.
In particular, we have developed the MMT language \cite{RK:mmt:10} and the concrete syntax of the OMDoc XML format \cite{omdoc} as a uniform representation language for such corpora.
And we have translated multiple proof assistant libraries into this format, including the ones of PVS in \cite{KMOR:pvs:17} and HOL Light in \cite{KR:hollight:14}.
Building on these developments, we are now able, for the first time, to apply generic methods --- i.e., methods that work at the MMT level --- to search for theory morphisms in these libraries.
Building on these developments, we are now able, for the first time, to apply generic methods --- i.e., methods that work at the MMT level --- to search for views in these libraries.
While inspired by the ideas of \cite{NorKoh:efnrsmk07}, our design and implementation are completely novel.
In particular, the theory makes use of the rigorous language-independent definitions of \emph{theory} and \emph{theory morphism} provided by MMT, and the practical implementation makes use of the MMT system, which provides high-level APIs for these concepts.
In particular, the theory makes use of the rigorous language-independent definitions of \emph{theory} and \emph{view} provided by MMT, and the practical implementation makes use of the MMT system, which provides high-level APIs for these concepts.
\cite{hol_isahol_matching} applies techniques related to ours to a related problem.
Instead, of theory morphisms inside a single corpus, they use machine learning to find similar constants in two different corpora.
Their results can roughly be seen as a single partial morphism from one corpus to the other.
Instead of views inside a single corpus, they use machine learning to find similar constants in two different corpora.
Their results can roughly be seen as a single partial view from one corpus to the other.
\paragraph{Approach and Contribution}
Our contribution is twofold. Firstly, we present a the design and implementation of a
generic theory morphism finder that works with arbitrary corpora represented in MMT. The
generic view finder that works with arbitrary corpora represented in MMT. The
algorithm tries to match two symbols by unifying their types. This is made efficient by
separating the term into a hashed representation of its abstract syntax tree (which serves
as a fast plausibility check for pre-selecting matching candidates) and the list of symbol
occurrences in the term, into which the algorithm recurses.
Secondly, we apply this view finder in two concrete case studies: In the first, we start with an abstract theory and try to figure out if it already exists in the same library. In the second example, we write down a simple theory of commutative operators in one language to find all commutative operators in another library based on a different language.
Secondly, we apply this view finder in two concrete case studies: In the first, we start with an abstract theory and try to figure out if it already exists in the same library -- the use case mention above. In the second example, we write down a simple theory of commutative operators in one language to find all commutative operators in another library based on a different language.
\paragraph{Overview}
In Section~\ref{sec:prelim}, we revise the basics of MMT and the representations of (exemplary) the PVS and HOL Light libraries.
Intuitively, \mmt is a declarative language for theories and theory morphisms over an arbitrary object language.
Intuitively, \mmt is a declarative language for theories and views over an arbitrary object language.
Its treatment of object languages is abstract enough to subsume most practically relevant logics and type theories.
Fig.~\ref{fig:mmt} gives an overview of the fundamental MMT concepts.
...
...
@@ -8,17 +8,17 @@ In the simplest case, \textbf{theories} $\Sigma$ are lists of \textbf{constant d
Naturally, $E$ must be subject to some type system (which MMT is also parametric in), but the details of this are not critical for our purposes here.
We say that $\Sigma'$ includes $\Sigma$ if it contains every constant declaration of $\Sigma$.
Correspondingly, a \textbf{theory morphism}$\sigma:\Sigma\to\Sigma'$ is a list of \textbf{assignments}$c\mapsto e'$ of $\Sigma'$-expressions $e'$ to $\Sigma$-constants $c$.
Correspondingly, a \textbf{view}$\sigma:\Sigma\to\Sigma'$ is a list of \textbf{assignments}$c\mapsto e'$ of $\Sigma'$-expressions $e'$ to $\Sigma$-constants $c$.
To be well-typed, $\sigma$ must preserve typing, i.e., we must have $\vdash_{\Sigma'}e':\ov{\sigma}(E)$.
Here $\ov{\sigma}$ is the homomorphic extension of $\sigma$, i.e., the map of $\Sigma$-expressions to $\Sigma'$-expressions that substitutes every occurrence of a $\Sigma'$-constant with the $\Sigma'$-expression assigned by $\sigma$.
We call $\sigma$\textbf{simple} if the expressions $e'$ are always $\Sigma'$-\emph{constants} rather than complex expressions.
The type-preservation condition for an assignment $c\mapsto c'$ reduces to $\ov{\sigma}(E)=E'$ where $E$ and $E'$ are the types of $c$ and $c'$.
We call $\sigma$\textbf{partial} if it does not contain an assignment for every $\Sigma$-constant.
A partial morphism from $\Sigma$ to $\Sigma'$ is the same as a total morphism from some theory included by $\Sigma$ to $\Sigma'$.
A partial view from $\Sigma$ to $\Sigma'$ is the same as a total view from some theory included by $\Sigma$ to $\Sigma'$.
Importantly, we can then show generally at the MMT-level that if $\sigma$ is well-typed, then $\ov{\sigma}$ preserves all $\Sigma$-judgments.
In particular, if we represent proofs as typed terms, theory morphisms preserve the theoremhood of propositions.
This property makes theory morphism so valuable for structuring, refactoring, and integrating large corpora.
In particular, if we represent proofs as typed terms, views preserve the theoremhood of propositions.
This property makes views so valuable for structuring, refactoring, and integrating large corpora.
MMT achieves language-independence through the use of \textbf{meta-theories}: every MMT-theory may designate a previously defined theory as its meta-theory.
For example, when we represent the HOL Light library in MMT, we first write a theory $L$ for the logical primitives of HOL Light.
...
...
@@ -36,7 +36,7 @@ Thus, we assume that $\Sigma$ and $\Sigma'$ have the same meta-theory $M$, and t
\multicolumn{3}{|c|}{meta-theory: a fixed theory $M$}\\
%\hline
\hline
& Theory $\Sigma$&Morphism$\sigma:\Sigma\to\Sigma'$\\
& Theory $\Sigma$&iew$\sigma:\Sigma\to\Sigma'$\\
\hline
set of & typed constant declarations $c:E$& assignments $c\mapsto E'$\\
$\Sigma$-expressions $E$& formed from $M$- and $\Sigma$-constants & mapped to $\Sigma'$ expressions \\
@@ -5,7 +5,7 @@ We now generalize to view-finding between theories in different libraries (and b
\end{itemize}
The normalizations mentioned in Section \ref{sec:normalizeintra} already suggest equating the involved logical primitives (such as logical connectives) via a meta-morphism.
Foundation-specific normalizations specifically for finding morphisms \emph{across} libraries is to our knowledge an as-of-yet unexplored field of investigation. Every formal system has certain unique idiosyncrasies, best practices or widely used features; finding an ideal normalization method is a correspondingly difficult domain-specific problem.
Foundation-specific normalizations specifically for finding views \emph{across} libraries is to our knowledge an as-of-yet unexplored field of investigation. Every formal system has certain unique idiosyncrasies, best practices or widely used features; finding an ideal normalization method is a correspondingly difficult domain-specific problem.
We will discuss some of our findings specifically regarding the PVS library as a case study.
...
...
@@ -16,7 +16,7 @@ PVS~\cite{pvs} is a proof assistant under active development, based on a higher-
In practice, theory parameters are often used in PVS for the signature of an abstract theory. For example, the theory of groups \cn{group\_def} in the NASA library has three theory parameters $(\cn T,\ast,\cn{one})$ for the signature, and includes the theory \cn{monoid\_def} with the same parameters; the axioms for a group are then formalized as a predicate on the theory parameters.
Given that the same practice is used in few other systems (if any), searching for morphisms without treating theory parameters in some way will not give us any useful results on these theories. We offer three approaches to handling these situations:
Given that the same practice is used in few other systems (if any), searching for views without treating theory parameters in some way will not give us any useful results on these theories. We offer three approaches to handling these situations:
\begin{enumerate}
\item\emph{Simple treatment:} We can interpret references to theory parameters as free variables and turn them into holes. Includes of parametric theories with arguments are turned into simple includes.
\item\emph{Covariant treatment:} We introduce new constants for the theory parameters and replace occurrences of the parameters by constant references. Includes with parameters are again replaced by normal includes.
...
...
@@ -47,7 +47,7 @@ We have tried the first two approaches regarding theory parameters -- i.e. the s
@@ -56,7 +56,7 @@ We have tried the first two approaches regarding theory parameters -- i.e. the s
\caption{Results of Inter- and Intra-Library View Finding in the PVS NASA Library}\label{fig:pvsresults}
\end{figure}
Most of the results in the simple MitM$\to$NASA case are artefacts of the theory parameter treatments -- in fact only two of the 17 results are meaningful (to operations on sets and the theory of number fields). In the covariant case, the more requirements of each simple morphism lead to fuller (one total) and less spurious morphisms.
Most of the results in the simple MitM$\to$NASA case are artefacts of the theory parameter treatments -- in fact only two of the 17 results are meaningful (to operations on sets and the theory of number fields). In the covariant case, the more requirements of each simple views lead to fuller (one total) and less spurious views.
With a theory from the NASA library as domain, the results are already too many to be properly evaluated by hand.
As an additional use case, we can write down a theory for a commutative binary operator using the MitM foundation, while targeting the PVS Prelude library -- allowing us to find all commutative operators, as in Figure \ref{fig:use:pvs} (using the simple approach to theory parmeters).
...
...
@@ -65,7 +65,7 @@ As an additional use case, we can write down a theory for a commutative binary o
\fbox{\includegraphics[width=\textwidth]{pvs}}
\caption{Searching for Commutative Operators in PVS}\label{fig:use:pvs}
\end{figure}
This example also hints at a way to iteratively improve the results of the view finder: since we can find properties like commutativity and associativity, we can use the results to in turn inform a better normalization of the theory by exploiting these properties. This in turn would potentially allow for finding more morphisms.
This example also hints at a way to iteratively improve the results of the view finder: since we can find properties like commutativity and associativity, we can use the results to in turn inform a better normalization of the theory by exploiting these properties. This in turn would potentially allow for finding more views.