TITLE: Automatically Finding Theory Morphisms for Knowledge Management

AUTHORS: Dennis Müller, Michael Kohlhase and Florian Rabe

Overall evaluation: 2 (accept)

----------- Overall evaluation -----------

This is very interesting work that should be presented. The writing is a bit non-optimal, but the main ideas seem clear. and I definitely want a system that can do what they say their system is able to do! Even if the system does only 20 percent of what they say it does, it's already a huge help.

Some clarifications are needed though:

1. The authors say: "we present the design and implementation of a generic view finder that works with arbitrary

corpora represented in MMT."

How costly is it to take a mathematical library and represent it in MMT? this will determine the applicability of the tool described here and needs to be addressed. Just knowing that the reference "M. Kohlhase, D. M¨uller, S. Owre, and F. Rabe. “Making PVS Accessible to Generic Services by Interpretation in a Universal Format”, 2017" *exists*, is not enough to assuage my fears that the tool is only applicable to very restricted contexts.

2. Since we're talking about theory morphisms mostly, the kind of algorithm described here should be applicable (perhaps?) to mathematical libraries, such as Mathematical Structures (http://math.chapman.edu/~jipsen/structures/doku.php/index.html) or the database of mathematical sequences, where the kind of refactoring that could be achieved would be amazing. This depends of course on not needing the theorem prover capabilities of PVS or HOL or Twelf, but only on the represented mathematical knowledge being faithfully translated. I am afraid I am not able to know if this is true or not. Can you clarify?

minor point:

"view finder" is not a concept that most people will be familiar with, you should explain it at the beginning of the paper.

- The authors say: "we present the design and implementation of a generic view finder that works with arbitrary

corpora represented in MMT."

How costly is it to take a mathematical library and represent it in MMT? this will determine the applicability of the tool described here and needs to be addressed. Just knowing that the reference "M. Kohlhase, D. M¨uller, S. Owre, and F. Rabe. “Making PVS Accessible to Generic Services by Interpretation in a Universal Format”, 2017" *exists*, is not enough to assuage my fears that the tool is only applicable to very restricted contexts.

-> Rebuttal?

- Since we're talking about theory morphisms mostly, the kind of algorithm described here should be applicable (perhaps?) to mathematical libraries, such as Mathematical Structures (http://math.chapman.edu/~jipsen/structures/doku.php/index.html) or the database of mathematical sequences, where the kind of refactoring that could be achieved would be amazing. This depends of course on not needing the theorem prover capabilities of PVS or HOL or Twelf, but only on the represented mathematical knowledge being faithfully translated. I am afraid I am not able to know if this is true or not. Can you clarify?

-> Rebuttal?

- "view finder" is not a concept that most people will be familiar with, you should explain it at the beginning of the paper.

-> TODO

- they do not give the complete details of their algorithm often saying "we leave the details out."

-> Due to page limitations - rebuttal?

- They also claim their algorithm is efficient, but do not offer any type of analysis showing this fact.

-> check

- Finally, the paper contains several lemmata without proofs. This makes it difficult to check their results validity. I recommend the authors add proofs to the appendix or to an online report.

-> Proofs probably wouldn't be illuminating, but we could/should consider an online report anyway.

- the authors mention twice that details are being skipped---it's not so easy to grok.

-> See above

- When the key phrase 'view finder' is introduced it should have some explanatory annotation to avoid semantic dissonance as the tyro reader meets it for the first time. For example "With a view finder (something to helpfind logical views) ..." or something better.

-> TODO

- Popular proof assistants such is Isabelle/HOL, Coq and Mizar have large amounts of theory knowledge tied up in various ways in their type systems. The paper does not discuss how this knowledge could be accessed. Has this been considered and what are the challenges?

-> Don't know much about Isabelle/Coq/Mizar -- @Florian?

- The weakest part of the presentation is the explanation of

the experimental results in Section 4.2. More space and effort on

this would help the reader appreciate what has been achieved and what

needs to be tackled next.

-> Rebuttal?

- (Here and elsewhere). The emphasis on finding morphism targets given a fixed source theory is odd. A primary use case is having have theory under development and wanting to know if there are related existing theories whose results can be drawn on. The running example of "Beautiful Sets" is an example of this. With such a use case one cares more to ask about morphism sources for a given target. Once such a source is found, one can transfer over theorems from the source theory to the target theory. It seems this emphasis turns out to be reasonable only because of the simple restriction on morphisms and the existence then of an inverse morphism when the morphism is 1-1. Please discuss this issue.

-> Check

- Up front, this section needs to make clear that the "constants" of a theory include types, constants, functions and proof witnesses as in Automath or constructive type theories, and that in declarations "c : E", the types E can be not only types, but also type classses and propositions (by a propositions as types correspondence). A concrete simple example of of a theory would help ensure the reader appreciates this.

-> Check

- The phrase "Naturally, E must be subject to some type system" is cryptic, especially given E itself is understood to be a kind of type.

-> Confused?

- When introducing the "view" terminology, give synonyms such as "theory morphism" and "theory interpretation" to help the reader. Is the introduction of a new term "view" really necessary? It confuses as much as helps this reviewer.

-> Check

- Explain the phrase "\Sigma-judgements".

-> Check

- Explicitly define what it means for a view to be total.

-> TODO

- The use of the term "meta-theory" is confusing and differs from its common usage. It seems that what are called "meta-theories" here could better be called "base theories"; they are theories on the same semantic level as the main theories being considered that get included into every one of these main theories. Then reserve the term "meta-theory" for this LF like theory in which the MMT system encodes all object theories.

-> Confused? Rebuttal?

- A few more words on the "module system" would help, given that expanding out theory imports and instantiations is a key part of the normalisation pre-processing step.

-> Check

- Make clear that constants with definiens can be used to encode the derived theorems of a theory.

-> Check

- The idea of "refactoring" theories needs explanation.

-> TODO

- What does "elaboration" mean?

-> TODO

- Are "structured theories" the same as "modules" of the previously mentioned "module system"? Could just one term be used for these?

-> Check

- Add a remark on how normalisation handles the case of defined constants c : E = e when E is a theorem in a system like PVS where proof witnesses e do not exist.

-> Rebuttal? (Insert dummy proofs)

- The core algorithm is very informally sketched. A little more precision would be helpful.

-> Page limitations; rebuttal?

- p8 Can it be made clear that the two \emptyset{}s come from different theories?

-> TODO

- 3.2 ends rather abruptly. So how much amalgamation is done? Does the implementation compute all maximal amalgamations? Is every possible amalgamation reported, or just the maximal ones?

-> Design choice - Rebuttal?

- Remark on the example working because the typed sets happen to be implemented as predicates.

-> TODO

- Sec 4: The discussion of meta-views is very condensed. Could an example be given which illustrates how they have to be approximated?

-> Check

- The contravariant elimination proposal needs further qualification. As is, with the described simple view inference, no parameter instantiation is possible. And when parameter instantiation is coded, it has to happen uniformly for all theory constants.

-> Rebuttal? (Preprocessing)

- Fig 5. The ability to discover commutative operators seems very poor. Can this be discussed?

-> Check

- Fig 6.

- The "Aggregated" column needs explanation.

- It would be nice if the rows for the inter and intra library cases

were clearer. Maybe add a column for the co-domain with a "NASA

Library" entry for each row.

-> Check

- The presentation of the results is not that satisfactory. The text suggests that most of the numbers in the table in Fig 6 are spurious, but the reasons why are not well explained. And the good examples found of theory morphisms were hard to understand. The last sentence of Section 4 needs expanding. Maybe some concrete details would help. The reader is left with the impression that the overall performance is poor. Is that true?

-> Check

- Here and elsewhere the paper has good discussion of possible applications of morphism discovery. But the evidence that the implemented system generates falls far short of what these applications need. More honesty is needed about the preliminary nature of the work and the challenges ahead.

-> Rebuttal?

- This reviewer's guess is that further work has to move beyond syntactic matching for axioms/theorems and has to bring in automatic theorem proving of some kind. It was suprising that this avenue forward is quickly dismissed as intractable at the start of Section 3 and not picked up on again in the further work discussion in the Conclusions.

-> Inefficient. Rebuttal?

NITPICKS:

- Add a period after the subheading on paragraphs. For example,

instead of "Motivation" write "Motivation.".

- p. 2. Define MMT before using it.

- p. 2, fourth paragraph. "Firstly, we present a the..." should be

"Firstly, we present the..."

- p. 3, second paragraph. Reword "practically relevant logics...",

because relevance logic is often called relevant logic. This

sentence can be confusing.

- p. 3, Fig. 1. "iew" should be "View"

- p. 7, Lemma 2. "in theories \Sigma' and \Sigma'..." should be

"in theories \Sigma and \Sigma'...".

- p. 10, last paragraph. "we find view..." should be "we find

views...".

- We have simple English words for two cases already and do not need neologisms longness => length, finitude => finiteness

- In Fig 1, "iew" -> "View"

- p6 First use of "(t,s)" notation before the notation is introduced is

cryptic. Some reordering of the introduction of this notation is

needed.

- p7 "Consider two constants c ∶ E and c' ∶ E' preprocessed...". -> "Consider two constants c ∶ E and c' ∶ E' where types E and E' are preprocessed...".