“Getting from Generative AI to Trustworthy AI: What LLMs Might Learn from Cyc”, Doug Lenat, Gary Marcus2023-07-31 (, , , )⁠:

Generative AI, the most popular current approach to AI, consists of large language models (LLMs) that are trained to produce outputs that are plausible, but not necessarily correct. Although their abilities are often uncanny, they are lacking in aspects of reasoning, leading LLMs to be less than completely trustworthy. Furthermore, their results tend to be both unpredictable and uninterpretable.

We [Douglas Lenat & Gary Marcus] lay out 16 desiderata for future AI [explanation · deduction · induction · analogy · abductive reasoning · theory of mind · quantifier-fluency · modal-fluency · defeasibility · pro/con arguments · contexts · meta-knowledge/reasoning · explicitly-ethical · sufficient-speed · sufficiently-lingual/embodied · broadly-deeply-knowledgeable], and discuss an alternative approach to AI which could theoretically address many of the limitations associated with current approaches: AI educated with curated pieces of explicit knowledge and rules of thumb, enabling an inference engine to automatically deduce the logical entailments of all that knowledge. Even long arguments produced this way can be both trustworthy and interpretable, since the full step-by-step line of reasoning is always available, and for each step the provenance of the knowledge used can be documented and audited. There is however a catch: if the logical language is expressive enough to fully represent the meaning of anything we can say in English, then the inference engine runs much too slowly. That’s why symbolic AI systems typically settle for some fast but much less expressive logic, such as knowledge graphs.

We describe how one AI system, Cyc, has developed ways to overcome that tradeoff and is able to reason in higher order logic in real time.

We suggest that any trustworthy general AI will need to hybridize the approaches, the LLM approach and more formal approach, and lay out a path to realizing that dream.


3. How Cyc handles some of these 16 elements: [see also SHRDLU & Schank’s critique] Large Language Models such as OpenAI’s ChatGPT and Google’s BARD and Microsoft’s Bing/Sydney represent one pole in potential architectural space, in which essentially neither knowledge nor reasoning is explicit. Cycorp’s CYC represents the opposite pole: a 4-decade-long 50-person project to explicitly articulate the tens of millions of pieces of common sense and general models of the world that people have, represent those in a form that computers can reason over mechanically, and develop reasoning algorithms which, working together, are able to do that reasoning sufficiently quickly.

…For that reason, Cycorp has persevered, unwilling to sacrifice the expressiveness of the logic involved, and its Cyc AI is the culmination of that effort. Over the past 4 decades it has developed engineering solutions to manage each of the 16 elements described in §2. Some are elegant; others simply required a lot of elbow grease—eg. for item 16, Cyc’s knowledge base (KB) comprises tens of millions of hand-authored assertions, almost all of which are general “rule of thumb” axioms (most of the “facts” Cyc knows are ones that it can just look up on the internet much as a person would, or access in databases where the schema of the database has been aligned to Cyc’s ontology.)…Tens of millions of assertions and rules were written and entered into Cyc’s KB by hand, but it is important to realize that even just performing one step of reasoning, Cyc could generate tens of billions of new conclusions that follow from what it already knows.

…decades ago the Cyc ontologists pointed Cyc to the Linnaean taxonomy system and added just one single rule to the Cyc KB of the form: For any 2 taxons, if one is not a specialization of the other (through a series of sub-taxon links), assume they are disjoint. This type of generalization was critical to have the KB-building enterprise take only (!) a few million person-hours of effort rather than a trillion. To speed up the educating process, the Cyc team developed tools that made use of the existing Cyc KB (and reasoners) to help the ontologists who were introspecting to unearth and formalize nuggets of common sense. For example, it was important that they generalize each nugget before entering into Cyc’s knowledge base…A software tool helps the ontologist semi-automatically walk up the hierarchy of types from “horse” to “physical object”, and from “leg” to “physical part”…Even with those Cyc-powered KB-building tools, it has taken a coherent team of logicians and programmers 4 decades, 2,000 person-years, to produce the current Cyc KB. Cycorp’s experiments with larger-sized teams generally showed a net decrease in total productivity, due to lack of coherence, deeper reporting chains, and so on.

…As we have already remarked, symbolic AI systems other than Cyc often approach speed very differently. Many limit their KB (which is what led to stove-piped Expert Systems), or they limit the expressiveness of their representation of knowledge, or they limit the types of operations that can be performed on those (ie. they adopt a more limited, but faster, logic.) Eg. they choose knowledge graphs or propositional logic which does not allow quantifiers, variables, modals, and so on…Cyc also allows multiple redundant representations for each assertion, and in practice it uses multiple redundant, specialized reasoners—Heuristic Level (HL) modules –each of which is much faster than general theorem-proving when it applies.

By 1989, Cyc had 20 such high-level reasoners (Lenat & Guha1990); today it has over 1,100. For example, one fairly general high-level reasoner is able to quickly handle transitive relations, such as “Is Austin physically located in the Milky Way galaxy?”…That reasoner was extremely general; a more specific one handles the case where a problem can be represented as n linear equations in n unknowns. A fairly narrow Heuristic-Level module recognizes quadratic equations and applies the quadratic formula. Another relatively narrow Heuristic-Level module recognizes a chemical equation that needs balancing and calls on a domain-specific algorithm to do that.

When confronted with a problem, all 1,100 reasoners are effectively brought to bear, and the most efficient one which can make progress on it does so, and the process repeats, over and over again, the “conversation” among the 1,100 Heuristic-Level modules continuing until the problem has been solved, or resource bounds have been exceeded (and work suspended on it). In principle there is always the general resolution theorem prover with its hand raised in the back of the room, so to speak: it always thinks it could apply, but it is the last resort to be called on because it always takes so long to return an answer…Something we don’t often talk about: We noticed empirically that the general theorem-proving reasoner actually took so long that over a million queries in a row that called on it, as a last resort, just timed out. Going back farther, we saw that that had happened for decades. So, about one decade ago, we quietly turned the general theorem prover off, so it never gets called on! The only impact is that Cyc sometimes runs a bit faster, since it no longer has that attractive but useless nuisance available to it.

When Cyc is applied to a new practical application, it is sometimes the case that even when it gets the right answers, its current battery of reasoners turns out to be unacceptably slow. In that case, the Cyc team shows to the human experts (who are able to perform the task quickly) Cyc’s step by step reasoning chain and asks them to introspect and explain to us how they are able to avoid such cumbersome reasoning. The result is often a new special-purpose Heuristic-Level reasoner, possibly with its own new, redundant representation which enables it to run so quickly. This is what happened, eg. for a chemical reaction application, where a special notation for chemical equations enabled a special-purpose algorithm to balance them quickly.

The trap the Cyc team fell into was assuming that there would be just one representation for knowledge, in which case it would have to be nth-order predicate calculus (HOL) with modals, because it is the only one expressive enough for all AGI reasoning purposes. Committing to that meant vainly searching for some fast general-purpose reasoning algorithm over HOL, which probably doesn’t exist. To escape from the trap the Cyc team built up a huge arsenal of redundant representations and redundant reasoners, such that in any given situation one of the efficient reasoners is usually able to operate on one of those representations and make some progress toward a solution. The entire arsenal is then brought to bear again, recursively, until the original problem has been fully dealt with or given up on.

[It sounds like the reason the Cyc company still exists is to serve as an expert-systems/knowledge-graph consultancy/body-shop for its customers, while masquerading as an AI/software company (similar to Palantir).]