« Return to Thread: mizar2kif

Re: Translating Mizar into some standard FOPL-like format?

by Josef Urban :: Rate this Message:

Reply to Author | View in Thread


Hi Ben,

On Tue, 21 Oct 2008, Ben Goertzel wrote:

> I haven't kept up with this list lately, and I'm curious what is the current
> state of efforts to translate the Mizar corpus of theorems and proofs into
> some fairly standard first-order-predicate-logic-like formalism?

I see that as you wrote ten months ago, you really got back to this in
2008 :-). I enclose the e-mails we exchanged on this. The extended version
of the paper describing export of Mizar proofs to TPTP and their
cross-verification by automated theorem provers is at
http://kti.mff.cuni.cz/~urban/mptp_gdv_full.pdf .

As I wrote before, I am really interested in AI systems working on this
(in my opinion fairly unique) knowledge base, and will further work on
making it more accessible to AI researchers if needed. I really believe
that "strong AI" can benefit a lot from large computer-understandable
repositories of "human thought", and that vice-versa, formal math can
benefit a lot from new "strong AI" methods combining imprecise, intuitive,
and heuristical thinking with precise deductive thinking.

This is no longer only about Mizar, the "Large Theory Batch" (LTB)
division of the CADE ATP System Competition (CASC) really happened for the
first time this year at IJCAR
(http://www.cs.miami.edu/~tptp/CASC/J4/Design.html#CompetitionDivisions)
with additional categories for the problems translated from the CyC and
SUMO knowledge bases, and $3000 prize money for the SUMO problems
(http://www.ontologyportal.org/reasoning.html) and $200 for the CyC
problems. The SUMO and CyC problems are however still quite "shallow" in
comparison with the math problems, and SUMO and CyC also still mainly
consist of axioms - they lack the nontrivial DAG structure of developed
math theories. I hope that eventually some problems extracted e.g. from
the Isabelle math library could be added to CASC-LTB, so that it does not
get overfocused on shallow reasoning in large ontologies.

Best,
Josef


On Sat, 22 Dec 2007, Benjamin Goertzel wrote:

> Thanks -- I have now looked through this material and it's very
> exciting .. I'm focusing
> on other stuff right now but do hope to get back to this area in 2008
> sometime... what
> you've done seems like a very important step...
>
> On Dec 16, 2007 10:14 AM, Josef Urban <urban@...> wrote:
>>
>> Hi Ben,
>>
>> I am now writing an extended version of a paper describing export of
>> Mizar proofs to TPTP (see
>> http://www.springerlink.com/content/t88848500815t188/ for the conference
>> version), and have just remembered your request for having Mizar proofs
>> exported to KIF or TPTP. So yes, I have translated it all to TPTP
>> derivations, which can be ATP-verified by tools like GDV (see the paper).
>> The page which presents it all is http://www.cs.miami.edu/~tptp/MizarTPTP/
>> , the TSTP icons there give you the TPTP derivations corresponding to each
>> Mizar proof (see
>> http://www.activemath.org/workshops/MathUI/07/proceedings/Urban-etal-MizarIDV-MathUI07.pdf
>> for explanation of the site). If you want tar.gz of all the proofs, some
>> version is at http://lipa.ms.mff.cuni.cz/~urban/nd_problems1.tar.gz (watch
>> out, it is huge). There are still some completness bugs, the biggest is
>> lack of explicit arithmetic evaluations done by mizar (inferences like
>> 4+5=9 that dumb FOL ATP system will not prove), but otherwise it mostly
>> works (see the paper).
>>
>> The MPTPChallenge is now over (see the results at
>> http://www.cs.miami.edu/~tptp/MPTPChallenge/Results/SVGResults.html), but
>> we would still be very interested in the results of your system. Also,
>> Geoff Sutcliffe is preparing a "batch division of CASC" for next year's
>> IJCAR, with similar setting like the MPTPChallenge (and possibly more
>> money from e.g. CyCorp, who will probably also provide some of their
>> problems). For my simple-but-pretty-useful inductive/deductive MaLARea
>> metasystem working in large theories, see
>> http://ftp.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-257/05_Urban.pdf
>> (hope you'll get motivated :-).
>>
>> Best,
>> Josef
>>
>>
>>
>> On Mon, 19 Mar 2007, Josef Urban wrote:
>>
>>>
>>>
>>> Ben,
>>>
>>> the MPTP export keeps at this moment quite a lot of the original Mizar proof
>>> structure in the TPTP 'useful info' slot, but not all of it. Depending on
>>> what kind of learning you want to do, it might or might not suffice. The
>>> biggest ommission is probably the lack of further description of unproved
>>> propositions (precisely in TPTP syntax: fofs with role 'unknown' and
>>> mptp_info(_,_,proposition,_,_)). Most of them are probably natural deduction
>>> (ND) assumptions, but some are not (e.g. propositions about constants
>>> introduced by the 'consider' keyword - they can be rather thought of as
>>> definitions).
>>>
>>> It should not be difficult to add the missing info there - the TPTP format is
>>> produced directly from the XML, which contains all the proof structure. There
>>> have been several reasons for postponing this so far - my focus on getting
>>> the reproving of the simple steps right (which is sort of a precondition for
>>> ATP cross-verification of Mizar, which in turn is a precondition for
>>> productive use of deductive tools like ATP as part of larger knowledge-based
>>> systems tailored for Mizar), and also a focus on getting the reproving of
>>> theorems from their external references right, which in a sense gives you a
>>> large-scale proof structure (which is not only usable for learning, but also
>>> - unlike the ND internal proofs - understandable to ATPs, and thus allowing
>>> things like the MPTP challenge). Another reason was that I did not want to
>>> decide about the details of MPTP ND annotations, until I decided about the
>>> export of ND proof structure to TPTP. The latter accidentally happened last
>>> week (not only as a next step for the cross-verification, but also as a
>>> megalomanic plan to build the detailed MML DAG with milions of nodes :-). So
>>> some missing annotations will appear in the next few weeks (maybe even days),
>>> and more importantly, the Mizar ND proofs will become TPTP proofs (if needed
>>> without any ND - though there is a good chance that assumptions will become
>>> acknowledged and processed parts of TPTP proofs).
>>>
>>> To sum up:
>>> - you can have 'raw Mizar' loaded by using the XML - that gives you access to
>>> the formulas and Mizar proof structure; the disadvantage is that for any
>>> deductive tool which you might want to use, you'll have to define the
>>> translation to its logic
>>> - there is the 'raw MPTP', with formulas in extended TPTP syntax containing
>>> the mptp_info annotations; this is sort of a middle way between the XML and
>>> standard TPTP; the annotations are likely to get a bit better in near future,
>>> what they annotate is still the Mizar ND structure
>>> - there are pre-generated 'standard TPTP' problem sets in the MPTP distro and
>>> the MPTPChallenge distro; these you can feed to ATP systems, and also
>>> learning systems (the symbols and proposition names are stable there - always
>>> the same semantics); for quite a lot of them an ATP proof can be found (and
>>> used for learning)
>>> - there will (hopefully soon) be a full TPTP (i.e. mostly/completely non-ND)
>>> proof structure export, compatible with the proofs produced by ATP systems
>>> like EP.
>>>
>>> Josef
>>>
>>> On Sun, 18 Mar 2007, Ben Goertzel wrote:
>>>
>>>>
>>>> Josef,
>>>>
>>>> Thanks for your reply!  However, I'm not sure we are fully understanding
>>>> each other.
>>>>
>>>> I agree that the TPTP language has a level of simplicity similar to that of
>>>> KIF, so that converting TPTP into KIF wouldn't be a big deal.
>>>>
>>>> And, from the point of view of my own Novamente AI system, writing a
>>>> specialized TPTP loader and bypassing KIF wouldn't be a big deal either.
>>>>
>>>> However, I am not sure from your email if the "MPTP export" of Mizar, at
>>>> this point, actually exports the **proofs** in the existing Mizar corpus
>>>> (Journal of Formalized Mathematics), or just the theorem statements.  Can
>>>> you clarify? For my purposes, I need the proofs not just the theorems, as
>>>> my desire is to do inductive learning of proof-patterns based on automated
>>>> analysis of the proof-corpus.
>>>>
>>>> Thanks
>>>> Ben Goertzel
>>>> Novamente LLC
>>>> ben@...
>>>>
>>>>
>>>>
>>>>>
>>>>> Hi Ben,
>>>>>
>>>>> On Sat, 17 Mar 2007, Ben Goertzel wrote:
>>>>>
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I am the leader of an AI project (www.novamente.net) and am interested
>>>>>> to apply my AI system to automated theorem-proving based on a method
>>>>>> of having it ingest the existing corpus of Mizar proofs, inductively
>>>>>> infer
>>>>>> "patterns of proof" therefrom, and then use these patterns to guide its
>>>>>> work in further proofs.
>>>>>
>>>>> Good, hope you'll submit your system to the MPTP Challenge
>>>>> (http://www.cs.miami.edu/~tptp/MPTPChallenge/), and write a paper about it
>>>>> to ESARLT (http://www.cs.miami.edu/~geoff/Conferences/ESARLT/).
>>>>>
>>>>>> However, the first issue confronted along this path is that my AI system
>>>>>> does not possess a Mizar loader.  This is because Mizar's syntax is
>>>>>> bloody complicated!  (Understandably so, because of its design goals.)
>>>>>>
>>>>>> Thus, it seems necessary to convert Mizar to some simpler and more
>>>>>> standard
>>>>>> format.  A sensible choice would seem to be KIF
>>>>>>
>>>>>> http://www.ksl.stanford.edu/knowledge-sharing/kif/
>>>>>>
>>>>>> but of course other similarly simple formats would do as well.
>>>>>
>>>>> Mizar is translated to TPTP by the MPTP export
>>>>> (http://kti.ms.mff.cuni.cz/~urban/MPTP2/mptp0.2.tar.gz). TPTP2KIF can be
>>>>> probably achieved through TPTP2X from the TPTP distro (not that there
>>>>> would be many "really running systems" fully implementing KIF (AFAIK)).
>>>>> Also, just for initial learning experiments, the MPTPChallenge problems
>>>>> (http://www.cs.miami.edu/~tptp/MPTPChallenge/MPTPChallenge.tgz) might be
>>>>> enough.
>>>>>
>>>>>> My question is whether anyone has created a converter of this sort, or
>>>>>> has
>>>>>> thought seriously about it.
>>>>>>
>>>>>> If no one has created one, does anyone have an educated estimate
>>>>>> regarding
>>>>>> how long this task would take for an individual with suitable background
>>>>>> in mathematics and software engineering?
>>>>>
>>>>> I hope that MPTP is enough for you, you are welcome to play with it and
>>>>> improve it - there are number of ways how one can "translate" Mizar and
>>>>> its proofs, e.g. with respect to optimizations available in various
>>>>> reasoning systems, and I hardly did the "basic" approach so far. Also
>>>>> Mizar is available in XML-ized format
>>>>> (http://mmlquery.mizar.org/mizardoc/xml/Mizar.html) which solves the
>>>>> low-level technical "loading" issue, and may be also usable e.g. for
>>>>> machine learning (fo some hints read e.g. section 3.5 of
>>>>> http://ktiml.mff.cuni.cz/~urban/mizxml.ps). However it is certainly still
>>>>> "Mizar world", not a standardized well-known format like TPTP.
>>>>>
>>>>> Best regards,
>>>>> Josef Urban
>>>>>
>>>>
>>>>
>>>
>>
>

 « Return to Thread: mizar2kif