Semantic/Pragmatic Guidance of Syntax Parsing via Boolean Satisfaction Solving


#1

This note outlines a potential AI R&D / software implementation subproject, that it appears to me would be helpful to OpenCog and SingularityNET in a few different ways.

The basic crux is: To make an effective tool via which grammar parsing can be done efficiently and in a way that incorporates semantic and pragmatic guidance, and that works for both automatically-learned and human-coded grammars across multiple languages.

The basic suggestion is to pick up some long-paused work on Boolean satisfaction based parsing for link grammars, for which there is existing code and a clear development path forward.

This is related to, but not exclusively useful for, our current work on unsupervised grammar induction for use in OpenCog and SingularityNET, which Anton Kolonin has recently written about in:

To take a step back, let me note that: The approach to parsing and semantic analysis taken in the current OpenCog NLP system is pipeline-based, with the link parser used to generate a bunch of possible parses of each input sentence, and then selection of which parses make semantic sense (in the relevant context) done afterwards. There is some parse ranking done within the link parser but it’s relatively simplistic.

This approach is OK for current experimentation and for many practical purposes but it’s not workable for the medium term, and certainly not for AGI.

The “right” approach, within this general region of design space, is to use a parser that enables semantic biasing of the parsing process.

Two partial steps in this direction have been taken so far.

One is the Viterbi link parser, created by Linas Vepstas, which aims to do syntax parsing within the OpenCog Atomspace,

using an approach inspired by the Viterbi decoder algorithm.

Once all the data required for parsing is in the Atomspace, interfacing semantics and pragmatics with parsing becomes “straightforward.”

Another is the SAT link parser, created (in 2008 or so?) by Filip Maric under supervision of Predrag Janicic and Ben Goertzel…. This was a Google Summer of Code project, and Filip and Predrag have long since moved on to other things.

The SAT link parser uses a Boolean satisfaction solver to perform parsing, after translating the portion of the link grammar dictionary relevant to a given sentence into a set of Boolean constraints.

As Filip discusses in the PDF documentation for the SAT parser, it would be tractable to modify the way the SAT solver does its search, so that it uses semantic and pragmatic information to bias its search process.

The SAT link parser code is here

https://github.com/opencog/link-grammar/blob/master/link-grammar/sat-solver/sat-based-link-grammar-parser.pdf

and the conceptual/mathematical/linguistic documentation is here:

My tentative suggestion is that, in the context of our current work on language learning (i.e. on automated, unsupervised learning of link grammars from corpora), it could be valuable to continue the SAT link parser work.

(The Viterbi work is also valuable, but my thinking is that for various practical projects we want parsing to be really fast, and doing syntax parsing in the OpenCog Atomspace would in the immediate instance lead to slowing things down not speeding things up… whereas SAT solvers are fast… In fact I have thought about integrating SAT solvers into OpenCog for other purposes as well, though this is not directly relevant to the present topic…)

The file that would be changed to incorporate semantic or pragmatic guidance into SAT parsing, in this context, is this one:

The really clever stuff is in here:

where the link grammar connectors and associated constraints are mapped into Boolean constraints.

(Although, a possibly relevant side comment is: Mapping the link grammar dictionary into Boolean constraints was complicated in the way Filip had to do it, but in several ways the automatically learned link grammar dictionaries are simpler than the human-created ones, and so one could use a less complicated mapping into constraints. I.e. I think only a subset of the stuff in the sat-encoder.cpp code is actually needed for handling the auto-generated link grammar dictionaries.)

Along with enabling semantic/pragmatic guidance of parsing, using the SAT link parser would open up many avenues for optimization going forward, for instance one could leverage existing work using single or multiple GPUs to accelerate SAT solvers, thus making parsing faster.


#2

Hi Ben. Thanks for the post, interesting reading. It caused some nostalgy because my bachelor thesis was exactly about Boolean Satisfaction solving. Precisely speaking the topic was “Splitting algorithms for solving the problem of propositional satisfiability”. And I have also written some SAT solver using splitting algorithms on C++. But the work presented in your post is much deeper of course because SAT solver is only a tool for more complicated problem solving here, the program isn’t limited only by the good SAT solver itself.