Nicolas M. Thiéry et al.
Tetrapod: Modular Knowledge workshop
FLoC 2018, Oxford, July 13th of 2018
Please connect to:
https://tinyurl.com/tetrapod-2018-computation
Over the last decades, a huge amount of computational software was developed for pure mathematics, in particular to support research and education. As for any complex ecosystem of software components, the ability to compose them has a multiplier effect on the expressive power, flexibility, and range of applications.
The purpose of this session is to share experience on the many barriers to composability in computational mathematics, and how they are being tackled in various communities. Of particular interest will be the exploitation of knowledge to leverage some of the barriers. feedback from neighbor fields (proofs, data, knowledge) will be most welcome.
For this session we focus on the ability to transfer data and run procedures across “systems” (two different software, two instances of the same software on different machines, two libraries within a system, …).
Goal: Transfer an object O from system A
to system B
Requirements:
A communication channel: shared memory, disk, pipe, (web)socket, …
A communication protocol
Serialization / Deserialization: conversion to/from a string of bytes
A format specification; e.g. XML (syntax) + OpenMath content dictionary (semantic)
Format conversion
syntax: e.g.: JSON <-> XML
adaptation: DihedralGroup(4)
<-> DihedralGroup(8)
change of representation (codec): recursive <-> sparse polynomial
Idealy: applying a theory morphism to guarantee semantic preservation
Goal: From B
request the computation f(a,b,c)
in A
Requirements:
Data transfers: send a
,b
,c
to A
, receive back the result
Procedure call: bindings, remote procedure calls, …
An API specification: syntax and semantic of f
in A
?
Adapting the API in B to the API in A: Size(A)
<-> A.cardinality()
Idealy: applying a theory morphism to guarantee semantic preservation
Many kinds of objects:
E.g. thousands in Sage, compared to “matrices of floats” in Matlab
Many data representations, with very different algorithmic complexity
E.g. for polynomials: sparse, dense, recursive, straight line program, evaluation, …
Objects at different levels of abstraction
Few core concepts
addition, multiplication commutativity
… but many interesting ways to combine them:
Groups, Fields, graded commutative algebras, …
A large variety of use cases
speed maniacs, flexibility fans, composability devouts
Formal level: best effort is ok
No requirements to define the maths behind
Closed science and the Montaigu-Capulet syndrome
(won’t use code from / share code with XXX for political reasons)
Tight man power resources
Severel hundreds of thousands use computer algebra; hundreds implement it; a handful are officially paid full time for it.
Mastering math & computer science?
High level of math specialization required when writing code
E.g. thousands of people use Gröbner bases; a handful know how to implement them right
Pieces of software that took decades to develop
We can’t afford to reimplement them; yet we can’t afford to not reimplement them (to learn from one’s mistakes and benefit from technology advances!)
A fragmented community
Systems written in different languages / idioms
C, C++, Python 2/3, Java, Javascript, Julia, GAP, Singular, Macaulay, Maple, Mathematica, MuPAD, Magma, Axiom/Aldor, Haskell, ML, Agda, …
Systems written using different frameworks / API
Packaging barriers
Tendency for large old systems: 10-30 years, lines of code
lots of technical debt; slow evolution
Lack of modularity, large dependencies
E.g. having to install SageMath to use just a few of its lines
Goal: standardized serialization format to exchange data across systems
Syntax: XML, json, binary
Semantic: defined by content dictionaries (CD)
Hard problem: agreeing on CD’s for polynomials was already hard; scaling?
After twenty years, little adoption
Goals:
Separate serialization from adaptation
Separate binding from adaptation
Separate the treatment of each system
Approach:
A central math ontology
Formalize each systems by aligning to the central ontology
Barrier: compose building blocks A1
and A2
into modular generic code B
: f(a1, a2)
Requirements: uniform modular API + method resolution mechanism
Remember: few core concepts, hundreds of interesting combinations
Approaches:
Axiom, MuPAD: OOP with large hierarchies of abstract classes (categories)
SageMath:
GAP: bespoke method resolution mechanism
Author: James B. Mitchel et al.
Problems
Speeding GAP’s semigroup package while increasing its availability
Use a low-level library A
from a system B
written in a different language
Approach
Rewrite as a standalone templated C++ library
API: many iterations until converging to a natural API (no memory management, …)
Bindings: from Python: on-the-fly binding using cppyy (also tried Cython, Pybind11, …)
Adaptation: a thin layer for now; could use alignments
Problem
Use funtionalities from a system A
in a system B
written in a different language
Data exchange
Objects returned as references (handles):
Binding
Originally: a pexpect interface
Now: ABI (direct calls at the C level)
Adapting
Currently:
In progress:
modular adapters, exploiting the category hierarchy and semantic alignments
objects returned as reference + semantic
alignment B.cardinality() -> Size(A)
attached to the category of sets
alignment B.conjugacy_classes() -> ConjugacyClasses(A)
attached to the category of groups
How to foster a sustainable and agile ecosystem where (sub)components have their own life cycle, explore new approaches, compete, and happily die when no more relevant.
What other barriers do you identify in Computational Math?
How do the barriers differ from other areas
Other case studies? solutions? approaches?