EFTA00624128.pdf
PDF Source (No Download)
Extracted Text (OCR)
Ben Goertzel with Cassio Pennachin & Nil Geisweiller &
the OpenCog Team
Engineering General Intelligence, Part 2:
The CogPrime Architecture for Integrative, Embodied
AGI
September 19, 2013
EFTA00624128
This book is dedicated by Ben Goertzel to his beloved,
departed grandfather, Leo Ztuell - an amazingly
warm-hearted, giving human being who was also a deep
thinker and excellent scientist, who got Ben started on the
path of science. As a careful experimentalist, Leo would
have been properly skeptical of the big hypotheses made
here - but he would have been eager to see them put to the
test!
EFTA00624130
Preface
Welcome to the second volume of Engineering General Intelligence! This is the second half of
a two-part technical treatise aimed at outlining a practical approach to engineering software
systems with general intelligence at the human level and ultimately beyond.
Our goal here is an ambitious one and not a modest one: Machines with flexible problem-
solving ability, open-ended learning capability, creativity and eventually, their own kind of
genius.
Part 1 set the stage, dealing with with a variety of general conceptual issues related to the
engineering of advanced AGI, as well as presenting a brief overview of the CogPrime design
for Artificial General Intelligence. Now here in Part 2 we plunge deep into the nitty-gritty, and
describe the multiple aspects of the CogPrime with a fairly high degree of detail.
First we describe the CogPrime software architecture and knowledge representation in de-
tail; then we review the "cognitive cycle" via which CogPrime perceives and acts in the world
and reflects on itself. We then turn to various forms of learning: procedural. declarative (e.g.
inference), simulative and integrative. Methods of enabling natural language functionality in
CogPrime are then discussed; and the volume concludes with a chapter summarizing the ar-
gument that CogPrime can lead to human-level (and eventually perhaps greater) AGI, and a
chapter giving a "thought experiment" describing the internal dynamics via which a completed
CogPrime system might solve the problem of obeying the request "Build me something with
blocks that I haven't seen before."
Reading this book before Engineering General Intelligence, Part 1 first is not especially
recommended, since the prequel not only provides context for this one, but it also defines a
number of specific terms and concepts that are used here without explanation (for example,
Part One has an extensive Glossary). However, the impatient reader who has not mastered
Part 1, or the reader who has finished Part 1 but is tempted to hop through Part 2 nonlinearly,
might wish to first skim the final two chapters, and then return to reading in linear order.
While the majority of the text here was written by the lead author Ben Goertzel, the overall
work and underlying ideas have been very much a team effort, with major input from the sec-
ondary authors Cassio Pennachin and Nil Geisweiller, and large contributions from various other
contributors as well. Nlany chapters have specifically indicated coauthors; but the contributions
from various collaborating researchers and engineers go far beyond these. The creation of the
AGI approach and design presented here is a process that has occurred over a long period of
time among a community of people; and this book is in fact a quite partial view of the existent
Iii
EFTA00624132
via
body of knowledge and intuition regarding CogPrime. For example, beyond the ideas presented
here, there is a body of work on the OpenCog wiki site, and then the OpenCog codebase itself.
More extensive introductory remarks may be found in Preface of Part 1, including a brief
history of the book and acknowledgements to some of those who helped inspire it.
Also, one brief comment from the Preface of Part 1 bears repeating: At several places in this
volume, as in its predecessor, we will refer to the "current" CogPrime implementation (in the
OpenCog framework); in all cases this refers to the OpenCog software system as of late 2013.
We fully realize that this book is not "easy reading", and that the level and nature of
exposition varies somewhat from chapter to chapter. We have done our best to present these
very complex ideas as clearly as we could, given our own time constraints, and the lack of
commonly understood vocabularies for discussing many of the concepts and systems involved.
Our hope is that the length of the book, and the conceptual difficulty of some portions, will
be considered as compensated by the interest of the ideas we present. For, make no mistake —
for all their technicality and subtlety, we find the ideas presented here incredibly exciting. We
are talking about no less than the creation of machines with intelligence, creativity and genius
equaling and ultimately exceeding that of human beings.
This is, in the end, the kind of book that we (the authors) all hoped to find when we first
entered the AI field: a reasonably detailed description of how to go about creating thinking
machines. The fact that so few treatises of this nature, and so few projects explicitly aimed
at the creation of advanced AGI, exist, is something that has perplexed us since we entered
the field. Rather than just complain about it, we have taken matters into our own hands, and
worked to create a design and a codebase that we believe capable of leading to human-level
AGI and beyond.
We feel tremendously fortunate to live in times when this sort of pursuit can be discussed in
a serious, scientific way.
Online Appendices
Just one more thing before getting started! This book originally had even more chapters than
the ones currently presented in Parts 1 and 2. In order to decrease length and increase fo-
cus, however, a number of chapters dealing with peripheral - yet still relevant and interest-
ing - matters were moved to online appendices. These may be downloaded in a single PDF
file at http: higoert zel.orgiengineering_general_Intenigence_appendices_
B-I4.pdf. The titles of these appendices are:
• Appendix A: Possible Worlds Semantics and Experiential Semantics
• Appendix B: Steps Toward a Formal Theory of Cognitive Structure and Dynamics
• Appendix C: Emergent Reflexive Mental Structures
• Appendix D: GOLEM: Toward an AGI Meta-Architecture Enabling Both Goal Preservation
and Radical Self-Improvement
• Appendix E: Lojban++: A Novel Linguistic Mechanism for Teaching AGI Systems
• Appendix F: PLN and the Brain
• Appendix G: Possible Worlds Semantics and Experiential Semantics
• Appendix H: Propositions About Environments in Which CogPrime Components are Useful
EFTA00624133
ix
None of these are critical to understanding the key ideas in the book, which is why they were
relegated to online appendices. However, reading them will deepen your understanding of the
conceptual and formal perspectives underlying the CogPrime design.
September 2013
Ben Goertzet
EFTA00624134
Contents
Section I Architectural and Representational Mechanisms
19 The OpenCog Framework
3
19.1 Introduction
3
19.1.1 Layers of Abstraction in Describing Artificial Minds
3
19.1.2 The OpenCog Framework
4
19.2 The OpenCog Architecture
5
19.2.1 OpenCog and Hardware Models
5
19.2.2 The Key Components of the OpenCog Framework
6
19.3 The AtomSpace
7
19.3.1 The Knowledge Unit: Atoms
7
19.3.2 AtomSpace Requirements and Properties
8
19.3.3 Accessing the Atomspace
9
19.3.4 Persistence
10
19.3.5 Specialized Knowledge Stores
11
19.4 MindAgents: Cognitive Processes
13
19.4.1 A Conceptual View of CogPrime Cognitive Processes
14
19.4.2 Implementation of MindAgents
15
19.4.3 Tasks
16
19.4.4 Scheduling of MindAgents and Tasks in a Unit
16
19.4.5 The Cognitive Cycle
17
19.5 Distributed AtomSpace and Cognitive Dynamics
18
19.5.1 Distributing the AtomSpace
18
19.5.2 Distributed Processing
23
20 Knowledge Representation Using the Atomspace
27
20.1 Introduction
27
20.2 Denoting Atoms
28
20.2.1 Meta-Language
28
20.2.2 Denoting Atoms
30
20.3 Representing Functions and Predicates
35
20.3.1 Execution Links
36
20.3.2 Denoting Schema and Predicate Variables
39
xi
EFTA00624136
xii
Contents
20.3.3 Variable and Combinator Notation
41
20.3.4 Inheritance Between Higher-Order Types
43
20.3.5 Advanced Schema Manipulation
44
21 Representing Procedural Knowledge
49
21.1 Introduction
49
21.2 Representing Programs
50
21.3 Representational Challenges
51
21.4 What Makes a Representation Tractable?
53
21.5 The Combo Language
55
21.6 Normal Forms Postulated to Provide Tractable Representations
55
21.6.1 A Simple Type System
56
21.6.2 Boolean Normal Form
57
21.6.3 Number Normal Form
57
21.6.4 List Normal Form
57
21.6.5 Tuple Normal Form
57
21.6.6 Enum Normal Form
58
21.6.7 Function Normal Form
58
21.6.8 Action Result Normal Form
58
21.7 Program Transformations
59
21.7.1 Reductions
59
21.7.2 Neutral Transformations
60
21.7.3 Non-Neutral Transformations
62
21.8 Interfacing Between Procedural and Declarative Knowledge
63
21.8.1 Programs Manipulating Atoms
63
21.9 Declarative Representation of Procedures
64
Section II The Cognitive Cycle
22 Emotion, Motivation, Attention and Control
67
22.1 Introduction
67
22.2 A Quick Look at Action Selection
68
22.3 Psi in C,ogPrime
69
22.4 Implementing Emotion Rules atop Psi's Emotional Dynamics
72
22.4.1 Grounding the Logical Structure of Emotions in the Psi Model
73
22.5 Goals and Contexts
73
22.5.1 Goal Atoms
74
22.6 Context Atoms
76
22.7 Ubergoal Dynamics
77
22.7.1 Implicit Ubergoal Pool Modification
77
22.7.2 Explicit Ubergoal Pool Modification
78
22.8 Goal Formation
78
22.9 Goal Fulfillment and Predicate Schematization
79
22.10Context Formation
79
22.11Execut ion Management
80
22.12Goals and Time
81
EFTA00624137
Contents
xiii
23 Attention Allocation
83
23.1 Introduction
83
23.2 Semantics of Short and Long Temi Importance
85
23.2.1 The Precise Semantics of STI and LTI
86
23.2.2 STI, STIFund, and Juju
89
23.2.3 Formalizing LTI
89
23.2.4 Applications of LT/bunt versus LT/cont
90
23.3 Defining Burst LTI in Terms of STI
91
23.4 Valuing LTI and STI in terms of a Single Currency
92
23.5 Economic Attention Networks
94
23.5.1 Semantics of Hebbian Links
94
23.5.2 Explicit and Implicit Hebbian Relations
95
23.6 Dynamics of STI and LTI Propagation
95
23.6.1 ECAN Update Equations
96
23.6.2 ECAN as Associative Memory
101
23.7 Glocal Economic Attention Networks
101
23.7.1 Experimental Explorations
102
23.8 Long-Term Importance and Forgetting
102
23.9 Attention Allocation via Data Mining on the System Activity Table
103
23.10Schema Credit Assignment
104
23.11Interaction between ECANs and other CogPrime Components
106
23.11.1Use of PLN and Procedure Learning to Help ECAN
106
23.11.2Use of ECAN to Help Other Cognitive Processes
106
23.12MindAgent Importance and Scheduling
107
23.13Information Geometry for Attention Allocation
108
23.13.1Brief Review of Information Geometry
108
23.13.2Information-Geometric Learning for Recurrent Networks: Extending
the ANGL Algorithm
109
23.13.3Information Geometry for Economic Attention Allocation: A Detailed
Example
110
24 Economic Goal and Action Selection
113
24.1 Introduction
113
24.2 Transfer of STI "Requests for Services" Between Goals
114
24.3 Feasibility Structures
116
24.4 Goal Based Schema Selection
116
24.4.1 A Game-Theoretic Approach to Action Selection
117
24.5 SchemaActivation
118
24.6 GoalBasedSchemaLearning
119
25 Integrative Procedure Evaluation
121
25.1 Introduction
121
25.2 Procedure Evaluators
121
25.2.1 Simple Procedure Evaluation
122
25.2.2 Effort Based Procedure Evaluation
122
25.2.3 Procedure Evaluation with Adaptive Evaluation Order
123
25.3 The Procedure Evaluation Process
123
EFTA00624138
xiv
Contents
25.3.1 Truth Value Evaluation
124
25.3.2 Schema Execution
125
Section III Perception and Action
26 Perceptual and Motor Hierarchies
129
26.1 Introduction
129
26.2 The Generic Perception Process
130
26.2.1 The ExperienceDB
131
26.3 Interfacing CogPrime with a Virtual Agent
131
26.3.1 Perceiving the Virtual World
132
26.3.2 Acting in the Virtual World
133
26.4 Perceptual Pattern Mining
134
26.4.1 Input Data
134
26.4.2 Transaction Graphs
135
26.4.3 Spatiotemporal Conjunctions
135
26.4.4 The Mining Task
136
26.5 The Perceptual-Motor Hierarchy
136
26.6 Object Recognition from Polygonal Meshes
137
26.6.1 Algorithm Overview
138
26.6.2 Recognizing PersistentPolygonNodes (PPNodes) from PolygonNodes
138
26.6.3 Creating Adjacency Graphs from PPNodes
139
26.6.4 Clustering in the Adjacency Graph
140
26.6.5 Discussion
140
26.7 Interfacing the Atomspace with a Deep Learning Based Perception-Action
Hierarchy
140
26.7.1 Hierarchical Perception Action Networks
141
26.7.2 Declarative Memory
142
26.7.3 Sensory Memory
142
26.7.4 Procedural Memory
142
26.7.5 Episodic Memory
143
26.7.6 Action Selection and Attention Allocation
144
26.8 Multiple Interaction Channels
144
27 Integrating CogPrime with a Compositional Spatiotemporal Deep
Learning Network
147
27.1 Introduction
147
27.2 Integrating CSDLNs with Other AI Frameworks
149
27.3 Semantic CSDLN for Perception Processing
149
27.4 Semantic CSDLN for Motor and Sensorimotor Processing
152
27.5 Connecting the Perceptual and Motoric Hierarchies with a Goal Hierarchy
154
28 Making DeSTIN Representationally Transparent
157
28.1 Introduction
157
28.2 Review of DeSTIN Architecture and Dynamics
158
28.2.1 Beyond Gray-Scale Vision
159
28.3 Uniform DeSTIN
159
28.3.1 Translation-Invariant DeSTIN
160
EFTA00624139
Contents
xv
28.3.2 Mapping States of Tran.slation-Invariant De$TIN into the Atomspace
161
28.3.3 Scale-Invariant DeSTIN
162
28.3.4 Rotation Invariant DeSTIN
163
28.3.5 Temporal Perception
164
28.4 Interpretation of DeSTIN's Activity
164
28.4.1 DeSTIN's Assumption of Hierarchical Decomposability
165
28.4.2 Distance and Utility
165
28.5 Benefits and Costs of Uniform DeSTIN
166
28.6 Imprecise Probability as a Tool for Linking CogPrime and DeSTIN
167
28.6.1 Visual Attention Focusing
167
28.6.2 Using Imprecise Probabilities to Guide Visual Attention Focusing
168
28.6.3 Sketch of Application to DeSTIN
168
29 Bridging the Symbolic/Subsymbolic Gap
171
29.1 Introduction
171
29.2 Simplified OpenCog Workflow
173
29.3 Integrating De$TIN and OpenCog
174
29.3.1 Mining Patterns from DeSTIN States
175
29.3.2 Probabilistic Inference on Mined Hypergraphs
176
29.3.3 Insertion of OpenCog-Learned Predicates into DeSTIN's Pattern Library 177
29.4 Multisensory Integration, and Perception-Action Integration
178
29.4.1 Perception-Action Integration
179
29.4.2 Thought-Experiment: Eye-Hand Coordination
181
29.5 A Practical Example: Using Subtree Mining to Bridge the Gap Between
DeSTIN and PLN
182
29.5.1 The Importance of Semantic Feedback
184
29.6 Some Simple Experiments with Letters
184
29.6.1 Mining Subtrees from DeSTIN States Induced via Observing Letterforms 184
29.6.2 Mining Subtrees from DeSTIN States Induced via Observing Letterforms 185
29.7 Conclusion
188
Section IV Procedure Learning
30 Procedure Learning as Program Learning
193
30.1 Introduction
193
30.1.1 Program Learning
193
30.2 Representation-Building
195
30.3 Specification Based Procedure Learning
196
31 Learning Procedures via Imitation, Reinforcement and Correction
197
31.1 Introduction
197
31.2 IRC Learning
197
31.2.1 A Simple Example of Imitation/Reinforcement Learning
198
31.2.2 A Simple Example of Corrective Learning
199
31.3 IRC Learning in the PetBrain
201
31.3.1 Introducing Corrective Learning
203
31.4 Applying A Similar IRC Methodology to Spontaneous Learning
203
EFTA00624140
xti
Contents
32 Procedure Learning via Adaptively Biased Hillcimbing
205
32.1 Introduction
205
32.2 Hillclimbing
206
32.3 Entity and Perception Filters
207
32.3.1 Entity filter
207
32.3.2 Entropy perception filter
207
32.4 Using Action Sequences as Building Blocks
208
32.5 Automatically Parametrizing the Program Size Penalty
208
32.5.1 Definition of the complexity penalty
208
32.5.2 Parameterizing the complexity penalty
209
32.5.3 Definition of the Optimization Problem
210
32.6 Some Simple Experimental Results
211
32.7 Conclusion
214
33 Probabilistic Evolutionary Procedure Learning
215
33.1 Introduction
215
33.1.1 Explicit versus Implicit Evolution in CogPrime
217
33.2 Estimation of Distribution Algorithms
218
33.3 Competent Program Evolution via MOSES
219
33.3.1 Statics
219
33.3.2 Dynamics
222
33.3.3 Architecture
223
33.3.4 Example: Artificial Ant Problem
224
33.3.5 Discussion
229
33.3.6 Conclusion
229
33.4 Integrating Feature Selection Into the Learning Process
230
33.4.1 Machine Learning, Feature Selection and AGI
231
33.4.2 Data- and Feature- Focusable Learning Problems
232
33.4.3 Integrating Feature Selection Into Learning
233
33.4.4 Integrating Feature Selection into MOSES Learning
234
33.4.5 Application to Genomic Data Classification
234
33.5 Supplying Evolutionary Learning with Long-Term Memory
236
33.6 Hierarchical Program Learning
237
33.6.1 Hierarchical Modeling of Composite Procedures in the AtomSpace
238
33.6.2 Identifying Hierarchical Structure In Combo trees via Metallodes and
Dimensional Embedding
239
33.7 Fitness Function Estimation via Integrative Intelligence
242
Section V Declarative Learning
34 Probabilistic Logic Networks
247
34.1 Introduction
247
34.2 A Simple Overview of PLN
248
34.2.1 Forward and Backward Chaining
249
34.3 First Order Probabilistic Logic Networks
250
34.3.1 Core FOPLN Relationships
250
34.3.2 PLN Truth Values
251
EFTA00624141
Contents
xvii
34.3.3 Auxiliary FOPLN Relationships
251
34.3.4 PLN Rules and Formulas
252
34.3.5 Inference Trails
253
34.4 Higher-Order PLN
254
34.4.1 Reducing HOPLN to FOPLN
255
34.5 Predictive Implication and Attraction
256
34.6 Confidence Decay
257
34.6.1 An Example
258
34.7 Why is PLN a Good Idea'
260
35 Spatiotemporal Inference
263
35.1 Introduction
263
35.2 Related Work on Spatio-temporal Calculi
264
35.3 Uncertainty with Distributional Fuzzy Values
267
35.4 Spatio-temporal Inference in PLN
270
35.5 Examples
272
35.5.1 Spatiotemporal Rules
272
35.5.2 The Laptop is Safe from the Rain
273
35.5.3 Fetching the Toy Inside the Upper Cupboard
273
35.6 An Integrative Approach to Planning
275
36 Adaptive, Integrative Inference Control
277
36.1 Introduction
277
36.2 High-Level Control Mechanisms
277
36.2.1 The Need for Adaptive Inference Control
278
36.3 Inference Control in PLN
279
36.3.1 Representing PLN Rules as GroundedSchemallodes
279
36.3.2 Recording Executed PLN Inferences in the Atomspace
279
36.3.3 Anatomy of a Single Inference Step
280
36.3.4 Basic Forward and Backward Inference Steps
281
36.3.5 Interaction of Forward and Backward Inference
282
36.3.6 Coordinating Variable Bindings
282
36.3.7 An Example of Problem Decomposition
284
36.3.8 Example of Casting a Variable Assignment Problem as an Optimization
Problem
284
36.3.9 Backward Chaining via Nested Optimization
285
36.4 Combining Backward and Forward Inference Steps with Attention Allocation
to Achieve the Same Effect as Backward Chaining (and Even Smarter Inference
Dynamics)
288
36.4.1 Breakdown into MindAgents
289
36.5 Hebbian Inference Control
289
36.6 Inference Pattern Mining
293
36.7 Evolution As an Inference Control Scheme
293
36.8 Incorporating Other Cognitive Processes into Inference
294
36.9 PLN and Bayes Nets
295
EFTA00624142
xtiii
Contents
37 Pattern Mining
297
37.1 Introduction
297
37.2 Finding Interesting Patterns via Program Learning
298
37.3 Pattern Mining via Frequent/Surprising Subgraph Mining
299
37.4 Fishgram
300
37.4.1 Example Patterns
300
37.4.2 The Fishgram Algorithm
301
37.4.3 Preprocessing
302
37.4.4 Search Process
303
37.4.5 Comparison to other algorithms
304
38 Speculative Concept Formation
305
38.1 Introduction
305
38.2 Evolutionary Concept Formation
306
38.3 Conceptual Blending
308
38.3.1 Outline of a CogPrime Blending Algorithm
310
38.3.2 Another Example of Blending
311
38.4 Clustering
312
38.5 Concept Formation via Formal Concept Analysis
312
38.5.1 Calculating Membership Degrees of New Concepts
313
38.5.2 Forming New Attributes
313
38.5.3 Iterating the Fuzzy Concept Formation Process
314
Section VI Integrative Learning
39 Dimensional Embedding
319
39.1 Introduction
319
39.2 Link Based Dimensional Embedding
320
39.3 Harel and Koren's Dimensional Embedding Algorithm
322
39.3.1 Step 1: Choosing Pivot Points
322
39.3.2 Step 2: Similarity Estimation
323
39.3.3 Step 3: Embedding
323
39.4 Embedding Based Inference Control
323
39.5 Dimensional Embedding and InheritanceLinks
325
40 Mental Simulation and Episodic Memory
327
40.1 Introduction
327
40.2 Internal Simulations
328
40.3 Episodic Memory
328
41 Integrative Procedure Learning
333
41.1 Introduction
333
41.1.1 The Diverse Technicalities of Procedure Learning in CogPrime
334
41.2 Preliminary Comments on Procedure Map Encapsulation and Expansion
336
41.3 Predicate Schematization
337
41.3.1 A Concrete Example
339
41.4 Concept-Driven Schema and Predicate Creation
340
41.4.1 Concept-Driven Predicate Creation
340
EFTA00624143
Contents
xix
41.4.2 Concept-Driven Schema Creation
341
41.5 Inference-Guided Evolution of Pattern-Embodying Predicates
342
41.5.1 Rewarding Surprising Predicates
342
41.5.2 A More Formal Treatment
344
41.6 PredicateNode Mining
345
41.7 Learning Schema Maps
346
41.7.1 Goal-Directed Schema Evolution
347
41.8 Occam's Razor
349
42 Map Formation
351
42.1 Introduction
351
42.2 Map Encapsulation
353
42.3 Atom and Predicate Activity Tables
355
42.4 Mining the AtomSpace for Maps
356
42.4.1 Frequent Itemset Mining for Map Mining
357
42.4.2 Evolutionary Map Detection
359
42.5 Map Dynamics
359
42.6 Procedure Encapsulation and Expansion
360
42.6.1 Procedure Encapsulation in More Detail
361
42.6.2 Procedure Encapsulation in the Human Brain
361
42.7 Maps and Focused Attention
362
42.8 Recognizing and Creating Self-Referential Structures
363
42.8.1 Encouraging the Recognition of Self-Referential Structures in the
AtomSpace
364
Section VII Communication Between Human and Artificial Minds
43 Communication Between Artificial Minds
369
43.1 Introduction
369
43.2 A Simple Example Using a PsyneseVocabulary Server
371
43.2.1 The Psynese Match Schema
373
43.3 Psynese as a Language
373
43.4 Psynese Mindplexes
374
43.4.1 AGI Mindplexes
375
43.5 Psynese and Natural Language Processing
376
43.5.1 Collective Language Learning
378
44 Natural Language Comprehension
379
44.1 Introduction
379
44.2 Linguistic Atom Types
381
44.3 The Comprehension and Generation Pipelines
382
44.4 Parsing with Link Grammar
383
44.4.1 Link Grammar vs. Phrase Structure Grammar
385
44.5 The RelEx Framework for Natural Language Comprehension
386
44.5.1 RelEx2Frame: Mapping Syntactico-Semantic Relationships into
FrameNet Based Logical Relationships
387
44.5.2 A Priori Probabilities For Rules
389
44.5.3 Exclusions Between Rules
389
EFTA00624144
xx
Contents
44.5.4 Handling Multiple Prepositional Relationships
390
44.5.5 Comparatives and Phantom Nodes
391
44.6 Frame2Atom
392
44.6.1 Examples of Frame2Atom
393
44.6.2 Issues Involving Disambiguation
396
44.7 Syn2Sem: A Semi-Supervised Alternative to RelEx and RelEx2Frame
397
44.8 Mapping Link Parses into Atom Structures
398
44.8.1 Example Training Pair
399
44.9 Making a Training Corpus
399
44.9.1 Leveraging RelEx to Create a Training Corpus
399
44.9.2 Making an Experience Based Training Corpus
399
44.9.3 Unsupervised, Experience Based Corpus Creation
400
44.10Limiting the Degree of Disambiguation Attempted
400
44.11Rule Format
401
44.11.1Example Rule
402
44.12Rule Learning
402
44.13Creating a Cyc-Like Database via Text Mining
403
44.14PROWL Grammar
404
44.14.1Brief Review of Word Grammar
405
44.14.2Word Grammar's Logical Network Model
406
44.14.3Link Grammar Parsing vs Word Grammar Parsing
407
44.14.4Contextually Guided Greedy Parsing and Generation Using Word Link
Grammar
411
44.15Aspects of Language Learning
413
44.15.1 Word Sense Creation
413
44.15.2Feature Structure Learning
414
44.15.3Transformation and Semantic Mapping Rule Learning
414
44.16Experiential Language Learning
415
44.17Which Path(s) Forward?
416
45 Language Learning via Unsupervised Corpus Analysis
417
45.1 Introduction
417
45.2 Assumed Linguistic Infrastructure
419
45.3 Linguistic Content To Be Learned
421
45.3.1 Deeper Aspects of Comprehension
423
45.4 A Methodology for Unsupervised Language Learning from a Large Corpus
423
45.4.1 A High Level Perspective on Language Learning
424
45.4.2 Learning Syntax
426
45.4.3 Learning Semantics
430
45.5 The Importance of Incremental Learning
434
45.6 Integrating Language Learned via Corpus Analysis into CogPrime's
Experiential Learning
435
46 Natural Language Generation
437
46.1 Introduction
437
46.2 SegSim for Sentence Generation
437
46.2.1 NLGen: Example Results
441
EFTA00624145
Contents
xxi
46.3 Experiential Learning of Language Generation
444
46.4 Sem2Syn
445
46.5 Conclusion
445
47 Embodied Language Processing
447
47.1 Introduction
447
47.2 Semiosis
448
47.3 Teaching Gestural Communication
450
47.4 Simple Experiments with Embodiment and Anaphor Resolution
455
47.5 Simple Experiments with Embodiment and Question Answering
456
47.5.1 Preparing/Matching Framm
456
47.5.2 Frames2RelEx
458
47.5.3 Example of the Question Answering Pipeline
458
47.5.4 Example of the PetBrain Language Generation Pipeline
459
47.6 The Prospect of Massively Multiplayer Language Teaching
460
48 Natural Language Dialogue
463
48.1 Introduction
463
48.1.1 Two Phases of Dialogue System Development
464
48.2 Speech Act Theory and its Elaboration
464
48.3 Speech Act Schemata and Triggers
465
48.3.1 Notes Toward Example SpeechActSchema
467
48.4 Probabilistic Mining of Trigger contexts
471
48.5 Conclusion
473
Section VIII From Here to AGI
49 Summary of Argument for the CogPrime Approach
477
49.1 Introduction
477
49.2 Multi-Memory Systems
477
49.3 Perception, Action and Environment
478
49.4 Developmental Pathways
479
49.5 Knowledge Representation
480
49.6 Cognitive Processes
480
49.6.1 Uncertain Logic for Declarative Knowledge
481
49.6.2 Program Learning for Procedural Knowledge
482
49.6.3 Attention Allocation
483
49.6.4 Internal Simulation and Episodic Knowledge
484
49.6.5 Low-Level Perception and Action
484
49.6.6 Goals
485
49.7 Fulfilling the "Cognitive Equation"
485
49.8 Occam's Razor
486
49.8.1 Mind Geometry
486
49.9 Cognitive Synergy
488
49.9.1 Synergies that Help Inference
488
49.10Synergies that Help MOSES
489
49.10.1Synergies that Help Attention Allocation
489
49.10.2Further Synergies Related to Pattern Mining
489
EFTA00624146
xxii
Contents
49.10.3Synergim Related to Map Formation
490
49.11Emergent Structures and Dynamics
490
49.12Ethical AGI
491
49.13Toward Superhuman General Intelligence
492
49.13.1Conclusion
492
50 Build Me Something I Haven't Seen: A CogPrime Thought Experiment
495
50.1 Introduction
495
50.2 Roles of Selected Cognitive Processes
496
50.3 A Semi-Narrative Treatment
506
50.4 Conclusion
509
A Glossary
511
A.1 List of Specialized Acronyms
511
A.2 Glossary of Specialized Terms
512
References
529
EFTA00624147
Section I
Architectural and Representational Mechanisms
EFTA00624148
Chapter 19
The OpenCog Framework
19.1 Introduction
The primary burden of this book is to explain the CogPrime architecture for AGI - the broad
outline of the design, the main dynamics it's intended to display once complete, and the reasons
why we believe it will be capable of leading to general intelligence at the human level and beyond.
The crux of CogPrime lies in its learning algorithms and how they are intended to interact
together synergetically, making use of CogPrime's knowledge representation and other tools.
Before we can get to this, however, we need to elaborate some of the "plumbing" within which
this learning dynamics occurs. We will start out with a brief description of the OpenCog frame-
work in which implementation of CogPrime has been, gradually and incrementally, occurring
for the last few years.
19.1.1 Layers of Abstraction in Describing Artificial Minds
There are multiple layers intervening between a conceptual theory of mind and a body of source
code. How many layers to explicitly discuss is a somewhat arbitrary decision, but one way to
picture it is exemplified in Table 19.1.
In Part 1 of this work we have concerned ourselves mainly with levels 5 and 6 in the table:
mathematical/conceptual modeling of cognition and philosophy of mind (with occasional forays
into levels 3 and 4). Most of Part 2, on the other hand, deals with level 4 (mathematical/concep-
tual AI design), verging into level 3 (high-level software design). This chapter however will focus
on somewhat lower-level material, mostly level 3 with sonic dips into level 2. We will describe the
basic architecture of CogPrime as a software system, implemented as "OpenCogPrime" within
the OpenCog Framework (OCF). The reader may want to glance back at Chapter 6 of Part 1
before proceeding through this one, to get a memory-refresh on basic CogPrime terminology.
Also, OpenCog and OpenCogPrime are open-source, so the reader who wishes to dig into the
source code (mostly C++, some Python and Scheme) is welcome to; directions to find the code
are on the opencog . org website.
3
EFTA00624150
4
19 The OpenCog Framework
Level of Abstraction
Description/Example
1 Source code
2 Detailed software design
8 Software architecture
Largely programming-language-independent, but not
hardware-architecture-independent: much of the ma-
terial in this chapter, for example, and most of the
OpenCog Framework
4 Mathematical and concep-
tual Al design
e.g., the sort of characterization of CogPrime given in
most of this Part of this book
5 Abstract mathematical mod-
eling of cognition
e.g. the SRAM model discussed in chapter 7 of Part
1, which could be used to inspire or describe many
different Al systems
6 Philosophy of mind
e.g. Patternism, the Mind-World Correspondence
Principle
Table 19.1: Levels of abstractions in CogPrime's implementation and design
19.1.2 The OpenCog Framework
The OpenCog Framework forms a bridge between the mathematical structures and dynamics
of CogPrime's concretely implemented mind, and the nitty-gritty realities of modern computer
technology. While CogPrime could in principle be implemented in a quite different infrastruc-
ture, in practice the CogPrime design has been developed closely in conjunction with OpenCog,
so that a qualitative understanding of the nature of the OCF is fairly necessary for an under-
standing of how CogPrime is intended to function, and a detailed understanding of the OCF is
necessary for doing concrete implementation work on CogPrime.
Marvin Minsky, in a personal conversation with one of the authors (Goertzel), once expressed
the opinion that a human-level general intelligence could probably be implemented on a 486
PC, if we just knew the algorithm. We doubt this is the case — at least not unless the 486
PC were supplied with masses of external memory and allowed to proceed much, much slower
than any human being - and it is certainly not the case for CogPrime. By current computing
hardware standards, a CogPrime system is a considerable resource hog. And it will remain so
for a number of years, even considering technology progress.
It is one of the jobs of the OCF to manage the system's gluttonous behavior. It is the software
layer that abstracts the real world efficiency compromises from the rest of the system; this is
why we call it a "Mind OS": it provides services, rules, and protection to the Atoms and cognitive
processes (see Section 19.4) that live on top of it, which are then allowed to ignore the software
architecture they live on.
And so, the nature of the OCF is strongly influenced by the quantitative requirements ha-
posed on the system, as well as the general nature of the structure and dynamics that it must
support. The large number and great diversity of Atoms needed to create a significantly intelli-
gent CogPrime, demands that we pay careful attention to such issues as concurrent, distributed
processing, and scalability in general. The number of Nodes and Links that we will need in
order to create a reasonably complete CogPrime is still largely unknown. But our experiments
with learning, natural language processing, and cognition over the past few years have given
us an intuition for the question. We currently believe that we are likely to need billions - but
probably not trillions, and almost surely not quadrillions - of Atoms in order to achieve a high
degree of general intelligence. Hundreds of millions strikes us as possible but overly optimistic.
EFTA00624151
19.2 The OpenCog Architecture
5
In fact we have already run CogPrime systems utilizing hundreds of millions of Atoms, though
in a simplified dynamical regime with only a couple very simple processes acting on most of
them.
The operational infrastructure of the OCF is an area where pragmatism must reign over ide-
alism. What we describe here is not the ultimate possible "mind operating system" to underlie
a CogPrime system, but rather a workable practical solution given the hardware, networking
and software infrastructure readily available today at reasonable prices. Along these lines, it
must be emphasized that the ideas presented in this chapter are the result of over a decade of
practical experimentation by the authors and their colleagues with implementations of related
software systems. The journey began in earnest in 1997 with the design and implementation of
the Webmind AI Engine at Intelligenesis Corp., which itself went through a few major design
revisions; and then in 2001-2002 the Novamente Cognition Engine was architected and imple-
mented, and evolved progressively until 2008, when a subset of it was adapted for open-sourcing
as OpenCog. Innumerable mistakes were made, and lessons learned, along this path. The OCF
as described here is significantly different. and better, than these previous architectures, thanks
to these lessons, as well as to the changing lamLscape of concurrent, distributed computing over
the past few years.
The design presented here reflects a mix of realism and idealism, and we haven't seen fit
here to describe all the alternatives that were pursued on the route to what we present. We
don't claim the approach we've chosen is ideal, but it's in use now within the OpenCog sys-
tem, and it seems both workable in practice and capable of effectively supporting the entire
CogPrime design. No doubt it will evolve in some respects as implementation progresses; one
of the principles kept in mind during the design and development of OpenCog was modular-
ity, enabling substantial modifications to particular parts of the framework to occur without
requiring wholesale changes throughout the codebase.
19.2 The OpenCog Architecture
19.2.1 OpenCog and Hardware Models
The job of the OCF Ls closely related to the nature of the hardware on which it runs. The ideal
hardware platform for CogPrime would be a massively parallel hardware architecture, in which
each Atom was given its own processor and memory. The closest thing would have been the
Connection Machine II Ii1891: a CM5 was once built with 64000 processors and local RAM for
each processor. But even 64000 processors wouldn't be enough for a highly intelligent CogPrime
to run in a fully parallelized manner, since we're sure we need more than 64000 Atoms.
Connection Machine style hardware seems to have perished in favor of more standard SNIP
(Symmetric Multi-Processing) machines. It is true that each year we see SNIP machines with
more and more processors on the market, and more and more cores per processor. However,
the state of the art is still in the hundreds of cores range, many orders of magnitude from what
would be necessary for a one Atom per processor CogPrime implementation.
So, at the present time, technological and financial reasons have pushed us to implement
the OpenCog system using a relatively mundane and standard hardware architecture. If the
CogPrime project is successful in the relatively near term, the first human-level OpenCogPrime
system will most likely live on a network of high-end commodity SMP machines. These are
EFTA00624152
6
19 The OpenCog Ftamework
machines with dozens of gigabytes of RAM and several processor cores, perhaps dozens but
not thousands. A highly intelligent CogPrime would require a cluster of dozens and possibly
hundreds or thousands of such machines. We think it's unlikely that tens of thousands will be
required, and extremely unlikely that hundreds of thousands will be.
Given this sort of architecture, we need effective ways to swap Atoms back and forth be-
tween disk and RAM, and carefully manage the allocation of processor time among the various
cognitive processes that demand it. The use of a widely-distributed network of weaker ma-
chines for peripheral processing is a serious possibility, and we have some detailed software
designs addressing this option: but for the near future we believe that this can best be used as
augmentation to core CogPrime processing, which must remain on a dedicated cluster.
Of course, the use of specialized hardware is also a viable passibility, and we have considered
a host of possibilities such as
• True supercomputers like those created by IBM or Cray (which these days are distributed
systems, but with specialized, particularly efficient interconnection frameworks and overall
control mechanisms)
• GPU supercomputers such as the Nvidia Tesla (which are currently being used for vision
processing systems considered for hybridization with OCP), such as DeSTIN and Hugo de
Garis's Parcone
• custom chips designed to implement the various CogPrime algorithms and data structures
in hardware
• More speculatively, it might be possible to use evolutionary quantum computing or adiabatic
quantum computing a la Dwave (ht tp : //dwave . corn) to accelerate CogPrime procedure
learning.
All these possibilities and many more are exciting to envision, but the CogPrime architecture
does not require any of them in order to be successful.
19.2.2 The Key Components of the OpenCog Framework
Given the realities of implementing CogPrime on clustered commodity servers, as we have seen
above, the three key questions that have to be answered in the OCF design are:
1. How do we store CogPrime's knowledge?
2. How do we enable cognitive processes to act on that knowledge, refining and improving it?
3. How do we enable scalable, distributed knowledge storage and cognitive processing of that
knowledge?
The remaining sections of this Chapter are dedicated to answering each of these questions
in more detail.
While the basic landscape of concurrent, distributed processing is largely the same as it was
a decade ago - we're still dealing with distributed networks of multiprocessor von Neumann ma-
chines - we can draw on advancements in both computer architecture and software. The former
is materialized on the increasing availability of multiple real and virtual cores in commodity
processors. The latter reflects the emergence of a number of tools and architectural patterns,
largely thanks to the rise of "big data" problems and businesses. Companies and projects dealing
EFTA00624153
19.3 The AtomSpace
7
with massive datasets face challenges that aren't entirely alike those of building CogPrime, but
which share many useful similarities.
These advances are apparent mostly in the architectute of the AtomSpace, a distributed
knowledge store for efficient storage of hypergraphs and its use by CogPrime's cognitive dy-
namics. The AtomSpace, like many NoSQL datastores, is heavily distributed, utilizing local
caches for read and write operations, and a special purpose design for eventual consistency
guarantees.
We also attempt to minimize the complexities of multi-threading in the scheduling of cogni-
tive dynamics, by allowing those to be deployed either as agents sharing a single OS process,
or, preferably, as processes of their own. Cognitive dynamics communicate through message
queues, which are provided by a sub-system that hides the deployment decision, so the mes-
sages exchanged are the same whether delivered within a proms, to another process in the
same machine, or to a process in another machine in the cluster.
19.3 The AtomSpace
As alluded to above and in Chapter 13. and discussed more fully in Chapter 20 below, the
foundation of CogPrime's knowledge representation is the Atom, an object that can be either
a Node or a Link. CogPrime's hypergraph is implemented as the AtomSpace, a specialized
datastore that comes along with an API designed specifically for CogPrime's requirements.
19.3.1 The Knowledge Unit: Atoms
Atoms are used to represent every kind of knowledge in the system's memory in one way or
another. The particulars of Atoms and how they represent knowledge will be discussed in later
chapters; here we present only a minimal description in order to motivate the design of the
AtomSpace. From that perspective, the most important properties of Atoms are:
• Every Atom has an AtomHandle, which is a universal ID across a CogPrime deployment
(possibly involving thousands of networked machines). The AtomHandles are the keys for
acessing Atoms in the AtomSpace, and once a handle is assigned to an Atom it can't be
changed or reused.
• Atoms have TruthValue and AttentionValue entities associated with them, each of which
are small collections of numbers; there are multiple versions of truth values, with varying
degrees of detail. TruthValues are context-dependent, and useful Atoms will typically have
multiple TruthValues. indexed by context.
• Some Atoms are nodes, and may have names.
• Atoms that are links will have a list of targets, of variable size (as in CogPrime's hypergraph
links may connect more than two nodes).
Some Atom attributes are immutable, such as Node names and, most importantly, Link
targets, called outgoing sets in AtomSpace lingo. One can remove a Link, but not change its
targets. This enables faster implementation of some neighborhood searches, as well as index-
EFTA00624154
8
19 The OpenCog Ftamework
Mg. Truth and attention values, on the other hand, are mutable, an essential requirement for
CogPrime.
For performance reasons, some types of knowledge have alternative representations. These
alternative representations are necessary for space or speed reasons, but knowledge stored that
way can always be translated back into Atoms in the AtomSpace as needed. So, for instance,
procedures are represented as program trees in a ProcedureRepository, which allows for faster
execution, but the trees can be expanded into a set of Nodes and Links if one wants to do
reasoning on a specific program.
19.3.2 AtomSpace Requirements and Properties
The major high-level requirements for the AtomSpace are the following ones:
• Store Atoms indexed by their immutable AtomHandles as compactly as possible, while still
enabling very efficient modification of the mutable properties of an Atom (TruthValues and
AttentionValues).
• Perform queries as fast as possible.
• Keep the working set of all Atoms currently being used by CogPrime's cognitive dynamics
in RAM.
• Save and restore hypergraphs to disk, a more traditional SQL or non-SQL database, or
other structure such as binary files, XML, etc.
• Hold hypergraphs consisting of billions or trillions of Atoms, scaling up to petabytes of
data.
• Be transparently distributable across a cluster of machines.
The design trade-offs in the AtomSpace implementation are driven by the needs of CogPrime.
The datastore is implemented in a way that maximizes the performance of the cognitive dynam-
ics running on top of it. From this perspective, the AtomSpace differs from most datastores, as
the key decisions aren't made in terms of flexibility, consistency, reliability and other common
criteria for databases. It is a very specialized database. Among the factors that motivate the
AtomSpace's design, we can highlight a few:
1. Atoms tend to be small objects, with very few exceptions (links with many targets or Atoms
with many different context-derived TruthValu).
2. Atom creation and deletion are common events, and occur according to complex patterns
that may vary a lot over time, even for a particular CogPrime instance.
3. Atoms involved in CogPrime's cognitive dynamics at any given time need to live in RAM.
However, the system still needs the ability to save sets of Atoms to disk in order to preserve
RAM, and then retrive those later when they get contextually relevant.
4. Some Atoms will remain around for a really long time, others will be ephemeral and get
removed shortly after they're created. Removal may be to disk, as outlined above, or plain
deletion.
Besides storing Atoms, the AtomSpace also contains a number of indices for fast Atom re-
trieval according to several criteria. It can quickly search for Atoms given their type, importance,
truth value, arity, targets (for Links), name (for Nodes), and any combination of the above.
These are built-in indexes. The AtomSpace also allows cognitive processes to create their own
EFTA00624155
19.3 The AtomSpace
9
indexes. based on the evaluation of a Procedure over the universe of Atoms, or a subset of that
universe specified by the process responsible for the index.
The AtomSpace also allows pattern matching queries for a given Atom structure template,
which allows for fast search for small subgraphs displaying some desirable properties. In ad-
dition to pattern matching, it provides neighborhood searches. Although it doesn't implement
any graph-traversal primitives, it's easy for cognitive processes to do so on top of the pattern
matching and neighborhood primitives.
Note that. since CogPrime's hypergraph is quite different from a regular graph, using a graph
database without modification would probably be inadequate. While it's possible to automati-
cally translate a hypergraph into a regular graph, that process is expensive for large knowledge
bases, and leads to higher space requirements, reducing the overall system's scalability.
In terms of database taxonomy, the AtomSpace lies somewhere between a key-value store
and a document store, as there is some structure in the contents of each value (an Atom's
properties are well defined, and listed above), but no built-in flexibility to add more contents
to an existing Atom.
We will now discuss the above requirements in more detail, starting with querying the Atom-
Space, followed by persistence to disk, and then handling of specific forms of knowledge that
are best handled by specialized stores.
19.3.3 Accessing the Atomspace
The AtomSpace provides an API, which allows the basic operations of creating new Atoms,
updating their mutable properties, searching for Atoms and removing Atoms. More specifically,
the API supports the following operations:
• Create and store a new Atom. There are special methods for Nodes and Links, in the latter
case with multiple convenience versions depending on the number of targets and other
properties of the link.
• Remove an Atom. This requires the validation that no Links currently point to that Atom,
otherwise they'd be left dangling.
• Look up one or more Atoms. This includes several variants, such as:
- Look up an Atom by AtomHandle;
- Look up a Node by name;
- Find links with an Atom as target;
- Pattern matching, i.e., find Atoms satisfying some predicate, which is designed as a
"search criteria" by some cognitive process, and results in the creation of a specific
index for that predicate;
- Neighborhood search, i.e., find Atoms that are within some radius of a given centroid
Atom;
- Find Atoms by type (this can be combined with the previous queries, resulting in type
specific versions);
- Find Atoms by some AttentionValue criteria, such as the top N most important Atoms,
or those with importance above some threshold (can also be combined with previous
queries);
EFTA00624156
10
19 The OpenCog Framework
- Find Atoms by some TruthValue criteria, similar to the previous one (can also be
combined with other queries);
- Find Atoms based on some temporal or spatial association, a query that relies on the
specialized knowledge stores mentioned below;
Queries can be combined, and the Atom type, AttentionValue and TruthValue criteria are
often used as filters for other queries, preventing the result set size from exploding.
• Manipulate an Atom, retrieving or modifying its AttentionValue and TruthValue. In the
modification case, this causes the respective indexes to be updated.
19.3.4 Persistence
In many planned CogPrime deployment scenarios, the amount of knowledge that needs to be
stored is too vast to fit in RAM, even if one considers a large cluster of machines hosting the
AtomSpace and the cognitive processes. The AtomSpace must then be able to persist subsets
of that knowledge to disk, and reload them later when necessary.
The decision of whether to keep an Atom in RAM or remove it is made based on its At-
tentionValue, through the process of economic attention allocation that is the topic of Chapter
23. AttentionValue determines how important an Atom is to the system, and there are mul-
tiple levels of importance. For the persistence decisions, the ones that matter are Long Term
Importance (LTI) and Very Long Term Importance (VLTI).
LTI is used to estimate the probability that the Atom will be necessary or useful in the not
too distant future. If this value is low, below a threshold i.1, then it is safe to remove the Atom
from RAM, a process called forgetting. When the decision to forget an Atom has been made,
VLTI enters the picture. VLTI is used to estimate the probability that the Atom will be useful
eventually at some distant point in the future. If VLTI is high enough, the forgotten Atom is
persisted to disk so it can be reloaded. Otherwise, the Atom is permanently forgotten.
When an Atom has been forgotten, a proxy is kept in its place. The proxy is more compact
than the original Atom, preserving only a crude measure of its LTI. When the proxy's LTI
increases above a second threshold is, the system understands that the Atom has become
relevant again, and loads it from disk.
Eventually, it may happen that the proxy doesn't become important enough over a very
long period of time. In this case, the system should remove even the proxy, if its Long Term
Importance (LTI) is below a third threshold i3. Other actions, usually taken by the system
administrator, can cause the removal of Atoms and their proxies from RAM. For instance, in a
CogPrime system managing information about a number of asers of some information system,
the deletion of a user from the system would came all that user's specific Atoms to be removed.
When Atoms are saved to disk and have no proxies in RAM, they can only be reloaded
by the system administrator. When reloaded, they will be disconnected from the rest of the
AtomSpace, and should be given special attention in order to pursue the creation of new Links
with the other Atoms in the system.
It's important that the values of i1, is, and is be set correctly. Otherwise, one or more of the
following problems may arise:
• If i1 and is are too close, the system may spend a lot of resources with saving and loading
Atoms.
EFTA00624157
19.3 The AtomSpace
11
• If it is set too high, important Atoms will be excluded from the system's dynamics, de-
creasing its intelligence.
• If i3 is set too high, the system will forget very quickly and will have to spend resources
re-creating necessary but no longer available evidence.
• If either it or i3 is set too low, the system will consume significantly more resources than
it needs to with knowledge store, sacrificing cognitive processes.
Generally, we want to enforce a degree of hysteresis for the freezing and defrosting process.
What we mean is that:
iz - it > et > 0
it — is > c2 > 0
This ensures that when Atoms are reloaded, their importance is still above the threshold for
saving, so they will have a chance to be part of cognitive dynamics and become more important,
and won't be removed again too quickly. It also ensures that saved Atoms stay in the system
for a period of time before their proxies are removed and they're definitely forgotten.
Another important consideration is that forgetting individual Atoms makes little sense, be-
cause, as pointed out above, Atoms are relatively small objects. So the forgetting process should
prioritize the removal of clusters of highly interconnected Atoms whenever possible. In that case,
it's passible that a large subset of those Atoms will only have relations within the cluster, so
their proxies aren't needed and the memory, savings are maximized.
19.3.5 Specialized Knowledge Stores
Some specific kinds of knowledge are best stored in specialized data structures, which allow
big savings in space, query time, or both. The information provided by these specialized stores
isn't as flexible as it would be if the knowledge were stored in full fledged Node and Link form,
but most of the time CogPrime doesn't need the fully flexible format. Translation between the
specialized formats and Nodes and Links is always possible, when necessary.
We note that the ideal set of specialized knowledge stores is application domain specific.
The stores we have deemed necessary reflect the pro-school based roadmap towards AGI, and
are likely sufficient to get as through most of that roadmap, but not sufficient nor particularly
adequate for an architecture where self-modification plays a key role. These specialized stores
are a pragmatic compromise between performance and formalism, and their existence and design
would need to be revised once CogPrime is mostly functional.
19.3.5.1 Procedure Repository
Procedural knowledge, meaning knowledge that can be used both for the selection and execution
of actions, has a specialized requirement - this knowledge needs to be executable by the system.
While it will be passible, and conceptually straightforward, to execute a procedure that is stored
as a set of Atoms in the AtomSpace, it is much simpler, faster, and safer to rely on a specialized
repository.
EFTA00624158
12
19 The OpenCog Ftamework
Procedural knowledge in CogPrime is stored as programs in a special-purpose LISP-like
programming language called Combo. The motivation and details of this language are the
subject of Chapter 21.
Each Combo program is associated with a Node (a GroundedProcedureNode, to be more
precise), and the AtomHandle of that Node is used to index the procedure repository, where
the executable version of the program is kept, along with specifications of the necessary inputs
for its evaluation and what kind of output to expect. Combo programs can also be saved to
disk and loaded, like regular Atoms. There is a text representation of Combo for this purpose.
Program execution can be very fast, or, in cognitive dynamics terms, very slow, if it involves
interacting with the external world. Therefore, the procedure repository should also facilitate
the storage of program states during the execution of procedures. Concurrent execution of many
procedures is possible with no significant overhead.
19.3.5.2 3D Space Map
In the AGI Preschool setting, CogPrime is embodied in a three-dimensional world (either a real
one, in which it controls a robot, or a virtual one, in which it controls an avatar). This requires
the efficient storage and querying of vast amounts of spatial data, including very specialized
queries about the spacial interrelationship between entities. This spatial data is a key form of
knowledge for CogPrime's world perception, and it also needs to be accessible during learning,
action selection, and action execution.
All spatial knowledge is stored in a 3D Space Map, which allows for fast queries about specific
regions of the world, and for queries about the proximity and relative placement of objects and
entities. It can be used to provide a coarse-grained object level perception for the AtomSpace,
or it can be instrumental in supporting a lower level vision layer in which pixels or polygons
are used as the units of perception. In both cases. the knowledge stored in the 3D Space Map
can be translated into full-fledged Atoms and Links through the AtomHandles.
One characteristic feature of spatial perception is that vast amounts of data are generated
constantly, but most of it is very quickly forgotten. The mind abstracts the perceptual data
into the relevant concepts, which are linked with other Atoms, and most of the underlying
information can then be discarded. The process is repeated at a high frequency as long as
something novel is being perceived in the world. 3D Space Map is then optimized for quick
inserts and deletes.
19.3.5.3 Time Server
Similarly to spatial information, temporal information poses challenges for a hypergraph-based
storage. It can be much more compactly stored in specific data structures, which also allow
for very fast querying. The Time Server is the specialized structure for storing and querying
temportal data in CogPrime.
Temporal information can be stored by any cognitive process, based on its own criteria for
determining that some event should be remembered in a specific temporal context in the future.
This can include the perception of specific events, or the agents participation in those. such as
the first time it meets a new human teacher. It can also include a collection of concepts describing
specific contexts in which a set of actions has been particularly useful. The possibilities are
EFTA00624159
19.4 NlindAgents: Cognitive Processes
13
numerous, but from the Time Server perspective, all equivalent. They add up to associating a
time point or time interval with a set of Atoms.
The Time Server is a bi-directional storage, as AtomHandles can be used as keys, but also
as objects indexed by time points or time intervals. In the former case, the Time Server tells
us when an Atom was associated with temporal data. In the latter case, it tells us, for a given
time point or interval. which Atoms have been marked as relevant.
Temporal indexing can be based on time points or time intervals. A time point can be at any
granularity: from years to sub-seconds could be useful. A time interval is simply a set of two
points, the second being necessary after the first one, but their granularities not necessarily the
same. The temporal indexing inside the Time Server is hierarchical, so one can query for time
points or intervals in granularities other than the ones originally used when the knowledge was
first stored.
19.3.5.4 System Activity Table Set
The last relevant specialized store is the System Activity Table Set, which is described in more
detail in Chapter 23. This set of tables records, with fine-grained temporal associations, the most
important activities that take place inside CogPrime. There are different tables for recording
cognitive process activity (at the level of MindAgents, to be described in the next section), for
maintaining a history of the level of achievement of each important goal in the system, and for
recording other important aspects of the system state. such as the most important Atoms and
contexts.
19.4 MindAgents: Cognitive Processes
The AtomSpace holds the system's knowledge, but those Atoms are inert. How Ls that knowledge
used and useful? That is the province of cognitive dynamics. These dynamics, in a CogPrime
system, can be considered on two levels.
First, we have the cognitive processes explicitly programmed into CogPrime's source code.
These are what we call Concretely-Implemented Mind Dynamics, or CIM-Dynamics. Their
implementation in software happens through objects called MindAgents. We use the term CIM-
Dynamic to discuss a conceptual cognitive process, and the term MindAgents for its actual
implementation and execution dynamics.
The second level corresponds to the dynamics that emerge through the system's self-
organizing dynamics, based on the cooperative activity of the CIM-Dynamics on the shared
AtomSpace.
Most of the material in the following chapters is concerted with particular CIM-Dynamics
in the CogPrime system. In this section we will simply give some generalities about the CIM-
Dynamics as abstract processes and as software processes, which are largely independent of the
actual Al contents of the CIM-Dynamics. In practice the CIM-Dynamics involved in a CogPrime
system are fairly stereotyped in form, although diverse in the actual dynamics they induce.
EFTA00624160
14
19 The Opetteog Ftamework
19.4.1 A Conceptual View of CogPrime Cognitive Processes
We return now to the conceptual trichotomy of cognitive processes presented in Chapter 3 of
Part 1, according to which CogPrime cognitive processes may be divided into:
• Control processes;
• Global cognitive processes;
• Focused cognitive processes.
In practical terms, these may be considered as three categories of CIM-Dynamic.
Control Process CIM-Dynamics are hard to stereotype. Examples are the process of homeo-
static parameter adaptation of the parameters associated with the various other CIM-Dynamics,
and the CIM-Dynamics concerned with the execution of procedures, especially those whose ex-
ecution is made lengthy by the interactions with the external world.
Control Processes tend to focus on a limited and specialized subset of Atoms or other entities,
and carry out specialized mechanical operations on them (e.g. adjusting parameters, interpreting
procedures). To an extent, this may be considered a "grab bag" category containing CIM-
Dynamics that are not global or focused cognitive processes according to the definitions of the
latter two categories. However, it is a nontrivial observation about the CogPrime system that
the CIM-Dynamics that are not global or focused cognitive processes are all explicitly concerned
with system control in some way or another, so this grouping makes sense.
Global and Focused Cognitive Process CIM-Dynamics all have a common aspect to their
structure. Then, there are aspects in which Global versus Focused CIM-Dynamics diverge from
each other in stereotyped ways.
In most cases, the process undertaken by a Global or Focused CIM-Dynamic involves two
parts: a selection process and an actuation process. Schematically, such a CIM-Dynamic typi-
cally looks something like this:
1. Fetch a set of Atoms that it is judged will be useful to process, according to some selection
process.
2. Operate on these Atoms, possibly together with previously selected ones (this is what we
sometimes call the actuation process of the CIM-Dynamic).
3. Go back to step 1.
The major difference between Global and Focused cognitive processes lies in the selection
process. In the case of a Global process, the selection process Ls very broad, sometimes yielding
the whole AtomSpace, or a significant subset of it. This means that the actuation process must
be very simple, or the activation of this CIM-Dynamic mast be very infrequent.
On the other hand, in the case of a Focused process, the selection process is very narrow,
yielding only a small number of Atoms, which can then be processed more intensively and
expensively, on a per-Atom basis.
Common selection processes for Focused cognitive processes are fitness-oriented selectors,
which pick one or a set of Atoms from the AtomSpace with a probability based on some nu-
merical quantity associated with the atom, such as properties of TruthValue or AttentionValue.
There are also more specific selection processes, which choose for example Atoms obeying
some particular combination of relationships in relation to some other Atoms; say choosing only
Atoms that inherit from some given Atom already being processed. There is a notion, described
in the PLN book, of an Atom Structure Template; this is basically just a predicate that applies
to Atoms, such as
EFTA00624161
19.4 MindAgents: Cognitive Processes
15
P (X) .tv
equals
((InheritanceLink X cat) AND (EvaluationLink eats(X e cheese)).tv
which is a template that matches everything that inherits from cat and eats cheese. Templates
like this allow a much more refined selection than the above fitness-oriented selection process.
Selection processes can be created by composing a fitness-oriented process with further re-
strictions, such as templates. or simpler type-based restrictions.
19.4.2 Implementation of MindAgents
MindAgents follow a very, simple design. They need to provide a single method through which
they can be enacted. and they should execute their actions in atomic, incremental steps, where
each step should be relatively quick. This design enables collaborative scheduling of MindAgents,
at the cost of allowing "opportunistic" agents to have more than their fair share of resources.
We rely on CogPrime developers to respect the above guidelines, instead of trying to enforce
exact resource allocations on the software level.
Each MindAgent can have a set of system parameters that guide its behavior. For instance, a
MindAgent dedicated to inference can provide drastically different conclusions if its parameters
tell it to select a small set of Atoms for processing each time, but to spend significant time on
each Atom, rather than selecting many Atoms and doing shallow inferences on each one. It's
expected that multiple copies of the same MindAgent will exist in the cluster, but delivering
different dynamics thanks to those parameters.
In addition to their main action method, MindAgents can also communicate with other
MindAgents through message queues. CogPrime has. in its runtime configuration, a list of
available MindAgents and their locations in the cluster. Communications between MindAgents
typically take the form of specific, one-time requests, which we call Tasks.
The default action of MindAgents and the processing of Tasks constitute the cognitive dy-
namics of CogPrime. Nearly everything that takes place within a CogPrime deployment is done
by either a MindAgent (including the control processes), a Task, or specialized code handling
AtomSpace internals or communications with the external world. We now talk about how those
dynamics are scheduled.
MindAgents live inside a process called a CogPrime Unit. One machine in a CogPrime cluster
can contain one or more Units, and one Unit can contain one or more MindAgents. In practice,
given the way the AtomSpace is distributed, which requires a control process in each machine, it
typically makes more sense to have a single Unit per machine, as this enables all MindAgents in
that machine to make direct function calls to the AtomSpace, instead of using more expensive
inter-process communication.
There are exceptions to the above guideline, to accommodate various situations:
1. Very specific MindAgents may not need to communicate with other agents, or only do so
very rarely, so it makes sense to give them their own process.
EFTA00624162
16
19 The Opetteog Ftamework
2. MindAgents whose implementation is a poor fit for the collaborative processing in small
increments design described above also should be given their own process, so they don't
interfere with the overall dynamics in that machine.
3. MindAgents whose priority is either much higher or much lower than that of other agents in
the same machine should be given their own proems, so operating system-level scheduling
can be relied upon to reflect those wry different priority levels.
19.4.5 Tasks
It is not convenient for CogPrime to do all its work directly via the action of MindAgent objects
embodying CIM-Dynamics. This is especially true for MindAgents embodying focused cognitive
processes. These have their selection algorithms, which are ideally suited to guarantee that, over
the long run, the right Atoms get selected and processed. This, however, doesn't address the
issue that, on many occasions, it may be necessary to quickly process a specific set of Atoms
in order to execute an action or rapidly respond to some demand. These actions tend to be
one-time, rather than the recurring patterns of mind dynamics.
While it would be possible to design MindAgents so that they could both cover their long
term processing needs and rapidly respond to urgent demands, we found it much simpler to
augment the MindAgent framework with an additional scheduling mechanism that we call
the Task framework. In essence, this is a ticketing system, designed to handle cases where
MindAgents or Schema spawn one - off tasks to be executed - things that need to be done only
once, rather that repeatedly and iteratively as with the things embodied in MindAgents.
For instance, grab the most important Atoms from the AtomSpace and do shallow PLN rea-
soning to derive immediate conclusions from them is a natural job for a MindAgent. But do
search to find entities that satisfy this particular predicate P is a natural job for a Task.
Tasks have AttentionValues and target MindAgents. When a Task is created it is submitted
to the appropriate Unit and then put in a priority queue. The Unit will schedule some resources
to processing the more important Tasks, as we'll see next.
19.4.4 Scheduling of MindAgents and Tasks in a Unit
Within each Unit we have one or more MindAgents, a Task queue and, optionally, a subset of the
distributed AtomSpace. If that subset isn't held in the unit, it's held in another process running
on the same machine. If there is more than one Unit per machine, their relative priorities are
handled by the operating system's scheduler.
In addition to the Units, CogPrime has an extra maintenance process per machine, whose
job is to handle changes in those priorities as well as reconfigurations caused by MindAgent
migration, and machines joining or leaving the CogPrime cluster.
So, at the Unit level, attention allocation in CogPrime has two aspects: how MindAgents
and Tasks receive attention from CogPrime, and how Atoms receive attention from different
MindAgents and Tasks. The topic of this Section is the former. The latter is dealt with elsewhere,
in two ways:
EFTA00624163
19.4 MindAgents: Cognitive Processes
17
• in Chapter 23, which discusses the dynamic updating of the AttentionValue structures asso-
ciated with Atoms, and how these determine how much attention various focused cognitive
processes MindAgents pay to them.
• in the discussion of various specific CIM-Dynamics, each of which may make choices of
which Atoms to focus on in its own way (though generally making use of AttentionValue
and TruthValue in doing so).
The attention allocation subsystem is also pertinent to MindAgent scheduling, because it
discusses dynamics that update ShortTermlmportance (STI) values associated with MindA-
gents, based on the usefulness of MindAgents for achieving system goals. In this chapter, we
will not enter into such cognitive matters, but will merely discuss the mechanics by which these
STI values are used to control processor allocation to MindAgents.
Each instance of a MindAgent has its own AttentionValue, which is used to schedule processor
time within the Unit. That scheduling is done by a Scheduler object which controls a collection
of worker threads, whose size is a system parameter. The Scheduler aims to allocate worker
threads to the MindAgents in a way that's roughly proportional to their STI, but it needs to
account for starvation, as well as the need to process the Tasks in the task queue.
This is an area in which we can safely borrow from reasonably mature computer science
research. The requirements of cognitive dynamics scheduling are far from unique, so this is not
a topic where new ideas need to be invented for OpenCog; rather, designs need to be crafted
meeting CogPrime's specific requirements based on state-of-the•art knowledge and experience.
One example scheduler design has two important inputs: the STI associated with each
MindAgent, and a parameter determining how much resources should go to the MindAgents vs
the Task queue. In the CogPrime implementation, the Scheduler maps the MindAgent STIs to
a set of priority queues, and each queue is nut a number of times per cycle. Ideally one wants
to keep the number of queues small, and rely on multiple Units and the OS-level scheduler to
handle widely different priority levels.
When the importance of a MindAgent changes, one just has to reassign it to a new queue,
which is a cheap operation that can be done between cycles. MindAgent insertions and removals
are handled similarly.
Finally, Task execution is currently handled via allocating a certain fixed percentage of pro-
cessor time, each cycle, to executing the top Tasks on the queue. Adaptation of this percentage
may be valuable in the long term but was not yet implemented.
Control processes are also implemented as MindAgents, and processed in the same way as
the other kinds of CIM-Dynamics, although they tend to have fairly low importance.
19.4.5 The Cognitive Cycle
We have mentioned the concept of a "cycle" in the discussion about scheduling, without ex-
plaining what we mean. Let's address that now. All the Units in a CogPrime cluster are kept
in sync by a global cognitive cycle, whose purpose is described in Section II.
We mentioned above that each machine in the CogPrime cluster has a housekeeping process.
One of its tasks is to keep track of the cognitive cycle, broadcasting when the machine has
finished its cycle, and listening to similar broadcasts from its counterparts in the cluster. When
all the machines have completed a cycle, a global counter is updated, and each machine is then
free to begin the next cycle.
EFTA00624164
18
19 The OpenCog Ftamework
One potential annoyance with this global cognitive cycle is that some machines may complete
their cycle much faster than others, and then sit idly while the stragglers finish their jobs.
CogPrime addresses this issue in two ways:
• Over the long run, a load balancing process will assign MindAgents from overburdened
machines to underutilized ones. The MindAgent migration process is described in the next
section.
• In a shorter time horizon, during which a machine's configuration is fixed, there are two
heuristics to minimize the waste of processor time without breaking the overall cognitive
cycle coordination:
- The Task queue in each of the machine's Units can be processed more extensively than
it would by default; in extreme cases, the machine can go through the whole queue.
- Background process MindAgents can be given extra activations, as their activity is
unlikely to throw the system out of sync, unlike with more focused and goal-oriented
processes.
Both heuristics are implemented by the scheduler inside each unit, which has one boolean
trigger for each heuristic. The triggers are set by the housekeeping process when it observes
that the machine has been frequently idle over the recent past, and then reset if the situation
changes.
19.5 Distributed AtomSpace and Cognitive Dynamics
As hinted above, realistic CogPrime deployments will be spread around reasonably large clus-
ters of co-located machines. This section describes how this distributed deployment scenario
is planned for in the design of the AtomSpace and the MindAgents, and how the cognitive
dynamics take place in such a scenario.
We won't review the standard principles of distributed computing here, but we will focus
on specific issues that arise when the CogPrime is spread across a relatively large number of
machines. The two key issues that need to be handled are:
• How to distribute knowledge (i.e., the AtomSpace) in a way that doesn't impose a large
performance penalty?
• How to allocate resources (i.e., machines) to the different cognitive processes (MindAgents)
in a way that's flexible and dynamic?
19.5.1 Distributing the AtomSpace
The design of a distributed AtomSpace was guided by the following high level requirements:
1. Scale up, transparently, to clusters of dozens to hundreds of machines, without requiring a
single central master server.
2. The ability to store portions of an Atom repository on a number of machines in a cluster,
where each machine also runs some MindAgens. The distribution of Atoms across the ma-
EFTA00624165
19.5 Distributed AtomSpace and Cognitive Dynamics
19
chines should benefit from the fact that the cognitive processes on one machine are likely
to access local Atoms more often than remote ones.
3. Provide transparent access to all Atoms in RAM to all machines in the cluster, even if at
different latency and performance levels.
4. For local access to Atoms in the same machine, performance should be as close as possible
to what one would have in a similar, but non-distributed AtomSpace.
5. Allow multiple copies of the same Atom to exist in different machines of the cluster, but
only one copy per machine.
6. As Atoms are updated fairly often by cognitive dynamics, provide a mechanism for even-
tual consistency. This mechanism needs not only to propagate changes to the Atoms, but
sometimes to reconcile incompatible changes, such as when two cognitive processes update
an Atom's TruthValue in opposite ways. Consistency is less important than efficiency, but
should be guaranteed eventually.
7. Resolution of inconsistencies should be guided by the importance of the Atoms involved, so
the more important ones are more quickly resolved.
8. System configuration can explicitly order the placement of sonic Atoms to specific machines,
and mark a subset of those Atoms as immovable, which should ensure that local copies are
always kept.
9. Atom placement across machines, aside from the immovable Atoms, should be dynamic,
rebalancing based on frequency of access to the Atom by the different machines.
The first requirement follows obviously from our estimates of how many machines CogPrime
will require to display advanced intelligence.
The second requirement above means that we don't have two kinds of machines in the
cluster, where some are processing servers and some are database servers. Rather, we prefer
each machine to store some knowledge and host some processes acting on that knowledge.
This design assumes that there are simple heuristic ways to partition the knowledge across the
machines, resulting in allocations that, most of the time, give the MindAgents local access to
the Atoms they need most often.
Alas, there will always be some cases in which a MindAgent needs an Atom that isn't
available locally. In order to keep the design on the MindAgents simple, this leads to the third
requirement, transparency, and to the fourth one, performance.
This partition design, on the other hand, means that there must be some replication of
knowledge, as there will always be some Atoms that are needed often by MindAgents on differ-
ent machines. This leads to requirement five (allow redundant copies of an Atom). However, as
MindAgents frequently update the mutable components of Atoms, requirements six and seven
are needed, to minimize the impact of conflicts on system performance while striving to guar-
antee that conflicts are eventually solved, and with priority proportional to the importance of
the impacted Atoms.
19.5.1.1 Mechanisms of Managing Distributed Atomspaces
When one digs into the details of distributed AtomSpaces, a number of subtleties emerge. Going
into these in full detail here would not be appropriate, but we will make a few comments, just
to give a flavor of the sorts of issues involved.
To discuss these issues clearly, some special terminology is useful. In this context, it is useful to
reserve the word "Atom" for its pure, theoretical definition, viz: "a Node is uniquely determined
EFTA00624166
20
19 The OpenCog Ftamework
by its name. A Link is uniquely determined by its outgoing set". Atoms sitting in RAM may then
be called "Realized Atoms". Thus, given a single, pure "abstract/theoretical" Atom, there might
be two different Realized Atoms, on two different servers, having the same name/outgoing-set.
It's OK to think of a RealizedAtom as a clone of the pure, abstract Atom, and to talk about
it that way. Analogously, we might call atoms living on disk, or flying on a wire, "Serialized
Atoms"; and, when need be, use specialized terms like "ZMQ-serialized atoms", or " BerkeleyDB-
serialized Atoms", etc.
An important and obvious coherency requirement is: "If a MindAgent asks for the Handle of
an Atom at time A, and then asks, later on, for the Handle of the same Atom, it should receive
the same Handle."
By the "AtomSpace", in general, we mean the container(s) that are used to store the set of
Atoms used in an OpenCog system, both in RAM and on disk. In the case of an Atom space
that is distributed across multiple machines or other data stores, we will call each of these an
"Atom space portion"
Atoms and Handles
Each OpenCog Atom is associated with a Handle object, which is used to identify the Atom
uniquely. The Handle is a sort of "key" used, at the infrastructure level, to compactly identify
the Atom. In a single-machine, non-distributed Atomspace, one can effectively just use long
ints as Handles, and assign successive ints as Handles to successively created new Atoms. In a
distributed Atomspace, it's a little subtler. Perhaps the cleanest approach in this case is to use
a hash of the serialized Atom data as the handle for an Atom. That way, if an Atom Ls created
in any portion, it will inherently have the same handle as any of its clones.
The issue of Handle collisions then occurs - it is possible, though it will be rare, that two
different Atoms will be assigned the same Handle via the hashing function. This situation can
be identified via checking, when an Atom is imported into a portion, whether there is already
some Atom in that portion with the same Handle but different fundamental aspects. In the rare
occasion where this situation does occur, one of the Atoms must then have its Handle changed.
Changing an Atom's handle everywhere it's referenced in RAM is not a big deal, so long as it
only happens occasionally. However, some sort of global record of Handle changes should be
kept, to avoid confusion in the process of loading saved Atoms from disk. If a loaded Atomspace
contains Atoms that have changed Handle since the file was saved, the Atom loading process
needs to know about this.
The standard mathematics of hash functions collisions, shows that if one has a space of H
possible Handles, one will get two Atoms with the same Handle after 1.25 x NO) tries, on
average.... Rearranging this, it means we'd need a space of around N2 Handles to have a space
of Handles for N possible Atoms, in which one collision would occur on average.... So to have a
probability of one collision, for N possible Atoms, one would have to use a handle range up to
N2. The number of bits needed to encode N2 is twice as many as the number needed to encode
N. So, if one wants to minimize collisions, one may need to make Handles twice as long, thus
taking up more memory.
However, this memory cost can be palliated via introducing "local Handles" separate from the
global, system-wide Handles. The local Handles are used internally within each local Atomspace,
and then each local Atomspace contains a translation table going back and forth between local
EFTA00624167
19.5 Distributed AtomSpace and Cognitive Dynamics
21
and global Handles. Local handles may be long ints, allocated sequentially to each new Atom
entered into a portion. Persistence to disk would always use the global Handles.
To understand the memory, tradeoffs involved in these solutions, assume that the global
Handles were k times as long as the local handles... and suppose that the average Handle
occurred r times in the local Atomspace. Then the memory inflation ratio of the local/global
solution as opposed to a solution using only the shorter local handles, would be
(1 -F k + r)/r = 1 -F (k+ 1)/r
if k=2 and r=10 (each handle is used 10 times on average, which is realistic based on current
real-world OpenCog Atomspaces), then the ratio is just 1.3x - suggesting that using hash codes
for global Handles, and local Handles to save memory in each local AtomSpace, is acceptable
memory-wise.
19.5.1.2 Distribution of Atoms
Given the goal of maximizing the probability that an Atom will be local to the machines of the
MindAgents that need it, the two big decisions are how to allocate Atoms to machines, and
then how to reconcile the results of MindAgents actuating on those Atoms.
The initial allocation of Atoms to machines may be done via explicit system configuration,
for Atoms known to have different levels of importance to specific MindAgents. That is, after
all, how MindAgents are initially allocated to machines as well.
One may, for instance, create a CogPrime cluster where one machine (or group) focuses on
visual perception, one focuses on language processing, one focuses on abstract reasoning, etc.
In that case one can hard-wire the location of Atoms.
What if one wants to have three abstract-reasoning machines in one's cluster? Then one
can define an abstract-reasoning zone consisting of three Atom repository portions. One can
hard-wire that Atoms created by MindAgents in the zone must always remain in that zone -
but can potentially be moved among different portions within that zone, as well as replicated
across two or all of the machines, if need be. By default they would still initially be placed in
the same portion as the MindAgent that created them.
However Atoms are initially placed in portions, sometimes it will be appropriate to move
them. And sometimes it will be appropriate to clone an Atom, so there's a copy of it in a
different portion from where it exists. Various algorithms could work for this, but the following
is one simple mechanism:
• When an Atom A in machine M1 is requested by a MindAgent in machine M2, then a clone
of A is temporarily created in M2.
• When an Atom is forgotten (due to low LTI), then a check is made if it has any clones, and
any links to it are changed into links to its clones.
• The LTI of an Atom may get a boost if that Atom has no clones (the amount of this boost
is a parameter that may be adjusted).
EFTA00624168
22
19 The OpenCog Ftamework
19.5.1.3 MindAgents and the Distributed AtomSpace
In the context of a distributed AtomSpace, the interactions between MindAgents and the knowl-
edge store become subtler, as we'll now discuss.
When a MindAgent wants to create an Atom, it will make this request of the local AtomSpace
process, which hosts a subset of the whole AtomSpace. It can, on Atom creation, specify whether
the Atom is immovable or not. In the former case, it will initially only be accessible by the
MindAgents in the local machine.
The process of assigning the new Atom an AtomHandle needs to be taken care of, in a way
that doesn't introduce a central master. One way to achieve that is to make handles hierarchical,
so the higher order bits indicate the machine. This, however, means that AtomHandles are
no longer immutable. A better idea is to automatically allocate a subset of the AtomHandle
universe to each machine. The initial use of those AtomHandles is the privilege of that machine
but, as Atoms migrate or are cloned, the handles can move through the cluster.
When a MindAgent wants to retrieve one or more Atoms, it will perform a query on the
local AtomSpace subset, just as it would with a single machine repository. Along with the
regular query parameters, it may specify whether the request should be processed locally only,
or globally. Local queries will be fast, but may fail to retrieve the desired Atoms, while global
queries may take a while to return. In the approach outlined above for MindAgent dynamics
and scheduling, this would just cause the MindAgent to wait until results are available.
Queries designed to always return a set of Atoms can have a third mode, which is "prioritize
local Atoms". In this case, the AtomSpace, when processing a query that looks for Atoms that
match a certain pattern would try to find all local responses before asking other machines.
19.5.1.4 Conflict Resolution
A key design decision when implementing a distributed AtomSpace is the trade-off between
consistency and efficiency. There is no universal answer to this conflict, but the usage scenarios
for CogPrime, current and planned, tend to fall on the same broad category as far consistency
goes. CogPrime's cognitive processes are relatively indifferent to conflicts and capable of working
well with outdated data, especially if the conflicts are temporary. For applications such as
the AGI Preschool, it is unlikely that outdated properties of single Atoms will have a large,
noticeable impact on the system's behavior; even if that were to happen on rare occasions, this
kind of inconsistency is often present in human behavior as well.
On the other hand, CogPrime assumes fairly fast access to Atoms by the cognitive processes,
so efficiency shouldn't be too heavily penalized. The robustness against mistakes and the need
for performance mean that a distributed AtomSpace should follow the principle of "eventual
consistency". This means that conflicts are allowed to arise, and even to persist for a while, but
a mechanism is needed to reconcile them.
Before describing conflict resolution, which in CogPrime is a bit more complicated than in
most applications, we note that there are two kinds of conflicts. The simple one happens when
an Atom that exists in multiple machines is modified in one machine, and that change isn't
immediately propagated. The less obvious one happens when some process creates a new Atom
in its local AtomSpace repository, but that Atom conceptually "already exists" elsewhere in
the system. Both scenarios are handled in the same way, and can become complicated when,
instead of a single change or creation, one needs to reconcile multiple operations.
EFTA00624169
19.5 Distributed AtomSpace and Cognitive Dynamics
23
The way to handle conflicts is to have a special purpose control process, a reconciliation
MindAgent, with one copy running on each machine in the cluster. This MindAgent keeps
track of all recent write operations in that machine (Atom creations or changes).
Each time the reconciliation MindAgent is called, it processes a certain number of Atoms
in the recent writes list. It chooses the Atoms to process based on a combination of their STI,
LTI and recency of creation/change. Highest priority is given to Atoms with higher STI and
LTI that have been around longer. Lowest priority is given to Atoms with low STI or LTI that
have been very recently changed - both because they may change again in the very near future,
and because they may be forgotten before it's worth solving any conflicts. This will be the case
with most perceptual Atoms, for instance.
By tuning how many Atoms this reconciliation MindAgent processes each time it's activated
we can tweak the consistency vs efficiency trade-off.
When the AtomReconciliation agent processes an Atom, what it does is:
• Searches all the machines in the cluster to see if there are other equivalent Atoms (for
Nodes, these are Atoms with the same name and type; for Links, these are Atoms with the
same type and targets).
• If it finds equivalent Atoms, and there are conflicts to be reconciled, such as different
TruthValues or AttentionValues, the decision of how to handle the conflicts is made by a
special probabilistic reasoning rule, called the Rule of Choice (see Chapter 34). Basically,
this means:
- It decided whether to merge the conflicting Atoms. We always merge Links, but some
Nodes may have different semantics, such as Nodes representing different procedures
that have been given the same name.
- In the case that the two Atoms A and B should be merged, it creates a new Atom C
that has all the same immutable properties as A and B. It merges their TruthValues
according to the probabilistic revision rule (see Chapter 34). The AttentionValues are
merged by prioritizing the higher importances.
- In the case that two Nodes should be allowed to remain separate, it allocates one of
them (say, B) a new name. Optionally, it also evaluates whether a SimilarityLink should
be created between the two different Nodes.
Another use for the reconcilitation MindAgent is maintaining approximate consistency be-
tween clones, which can be created by the AtomSpace itself, a S described above in Subsection
19.5.1.2. When the system knows about the multiple clones of an Atom, it keeps note of these
versions in a list, which is processed periodically by a conflict resolution MindAgent, in order
to prevent the clones from drifting too far apart by the actions of local cognitive processes in
each machine.
19.5.2 Distributed Processing
The OCF infrastructure as described above already contains a lot of distributed processing
implicit in it. However. it doesn't tell you how to make the complex cognitive processes that are
part of the CogPrime design distributed unto themselves - say, how to make PLN or MOSES
themselves distributed. This turns out to be quite passible, but becomes quite intricate and
EFTA00624170
24
19 The OpenCog Ftamework
specific depending on the particular algorithms involved. For instance, the current MOSES
implementation is now highly amenable to distributed and multiprocessor implementation, but
in a way that depends subtly on the specifics of MOSES and has little to do with the role of
MOSES in CogPrime as a whole. So we will not delve into these topics here.
Another possibility worth mentioning is broadly distributed processing, in which CogPrime
intelligence is spread across thousands or millions of relatively weak machines networked via the
Internet. Even if none of these machines is exclusively devoted to CogPrime, the total processing
power may be massive, and massively valuable. The use of this kind of broadly distributed
computing resource to help CogPrime is quite possible, but involves numerous additional control
problems which we will not address here.
A simple case is massive global distribution of MOSES fitness evaluation. In the case where
fitness evaluation is isolated and depends only on local data, this is extremely straightforward. In
the more general case where fitness evaluation depends on knowledge stored in a large Atom-
Space, it requires a subtler design. wherein each globally distributed MOSES subpopulation
contains a pool of largely similar genotypes and a cache of relevant parts of the AtomSpace,
which is continually refreshed during the fitness evaluation process. This can work so long as
each globally distributed lobe has a reasonably reliable high bandwidth, low latency connection
to a machine containing a large AtomSpace.
On the more mundane topic of distributed processing within the main CogPrime cluster,
three points are worth discussing:
• Distributed communication and coordination between MindAgents.
• Allocation of machines to functional groups, and MindAgent migration.
• Machines entering and leaving the cluster.
19.5.2.1 Distributed Communication and Coordination
Communications between MindAgents, Units and other CogPrime components are handled by
a message queue subsystem. This subsystem provides a unified API, so the agents involved
are unaware of the location of their partners: distributed messages, inter-process messages in
the same machine, and intra-process messages in the same Unit are sent through the same
API. and delivered to the same target queues. This design enables transparent distribution of
MindAgents and other components.
In the simplest case, of MindAgents within the same Unit, messages are delivered almost
immediately, and will be available for processing by the target agent the next time it's enacted
by the scheduler. In the case of messages sent to other Units or other machines, they're delivered
to the messaging subsystem component of that unit, which has a dedicated thread for message
delivery. That subsystem is scheduled for processing just like any other control process, although
it tends to have a reasonably high importance, to ensure speedy delivery.
The same messaging API and subsystem is used for control-level communications, such as
the coordination of the global cognitive cycle. The cognitive cycle completion message can be
used for other housekeeping contents as well.
EFTA00624171
19.5 Distributed AtomSpace and Cognitive Dynamics
25
19.5.2.2 Functional Groups and MindAgent Migration
A CogPrime cluster is composed of groups of machines dedicated to various high-level cognitive
tasks: perception processing, language processing, background reasoning, procedure learning,
action selection and execution, goal achievement planning, etc. Each of these high-level tasks
will probably require a number of machines, which we call functional groups.
Most of the support needed for functional groups is provided transparently by the mechanisms
for distributing the AtomSpace and by the communications layer. The main issue is how much
resources (i.e., how many machines) to allocate to each functional group. The initial allocation
is determined by human administrators via the system configuration - each machine in the
cluster has a local configuration file which tells it exactly which processes to start, along with
the collection of MindAgents to be loaded onto each process and their initial AttentionValues.
Over time, however, it may be necessary to modify this allocation, adding machines to
overworked or highly important functional groups. For instance, one may add more machines
to the natural language and perception processing groups during periods of heavy interaction
with humans in the preschool environment, while repurposing those machines to procedure
learning and background inference during periods in which the agent controlled by CogPrime
is resting or 'sleeping'.
This allocation of machines is driven by attention allocation in much the same way that
processor time is allocated to MindAgents. Functional groups can be represented by Atoms,
and their importance levels are updated according to the importance of the system's top level
goals, and the usefulness of each functional group to their achievement. Thus, once the agent is
engaged by humans, the goals of pleasing them and better understanding them would become
highly important, and would thus drive the STI of the language understanding and language
generation functional groups.
Once there is an imbalance between a functional group's STI and its share of the machines
in the cluster, a control process CIM-Dynamic is triggered to decide how to reconfigure the
cluster. This CIM-Dynamic works approximately as follows:
• First, it decides how many extra machines to allocate to each sub-represented functional
group.
• Then, it ranks the machines not already allocated to those groups based on a combination
of their workload and the aggregate STI of their MindAgents and Units. The goal is to
identify machines that are both relatively unimportant and working under capacity.
• It will then migrate the MindAgents of those machines to other machines in the same
functional group (or just remove them if clones exist), freeing them up.
• Finally, it will decide how best to allocate the new machines to each functional group.
This decision is heavily dependent on the nature of the work done by the MindAgents in
that group, so in CogPrime these decisions will be somewhat hardcoded, as is the set of
functional groups. For instance, background reasoning can be scaled just by adding extra
inference MindAgents to the new machines without too much trouble. but communicating
with humans requires MindAgents responsible for dialog management, and it doesn't make
sense to clone those, so it's better to just give more resources to each MindAgent without
increasing their numbers.
The migration of MindAgents becomes, indirectly, a key driver of Atom migration. As MindA-
gents move or are cloned to new machines, the AtomSpace repository in the source machine
EFTA00624172
26
19 The OpenCog Ftamework
should send clones of the Atoms most recently used by these MindAgents to the target ma-
chine(s), anticipating a very likely distributed request that would create those clones in the near
future anyway. If the MindAgents are moved but not cloned, the local copies of those Atoms in
the source machine can then be (locally) forgotten.
19.5.2.3 Adding and Removing Machines
Given the support for MindAgent migration and cloning outlined above, the issue of adding new
machines to the cluster becomes a specific application of the heuristics just described. When a
new machine Ls added to the cluster, CogPrime initially decides on a functional group for it,
based both on the importance of each functional group and on their current performance - if
a functional group consistently delays the completion of the cognitive cycle, it should get more
machines, for instance. When the machine is added to a functional group, it is then populated
with the most important or resource starved MindAgents in that group, a decision that is taken
by economic attention allocation.
Removal of a machine follows a similar process. First the system checks if the machine can be
safely removed from its current functional group, without greatly impacting its performance. If
that's the case, the non-cloned MindAgents in that machine are distributed among the remaining
machines in the group, following the heuristic described above for migration. Any local-only
Atoms in that machine's AtomSpace container are migrated as well, provided their LTI is high
enough.
In the situation in which removing a machine M1 would have an intolerable impact on the
functional group's performance, a control process selects another functional group to lose a
machine 2112. Then, the NlindAgents and Atoms in M1 are migrated to M2, which goes through
the regular removal process first.
In principle, one might use the insertion or removal of machines to perform a global op-
timization of resource allocation within the system, but that process tends to be much more
expensive than the simpler heuristics we just described. We believe these heuristics can give
us most of the benefits of global re-allocation at a fraction of the disturbance for the system's
overall dynamics during their execution.
EFTA00624173
Chapter 20
Knowledge Representation Using the Atomspace
20.1 Introduction
CogPrime's knowledge representation must be considered on two levels: implicit and explicit.
This chapter considers mainly explicit knowledge representation, with a focus on representation
of declarative knowledge. We will describe the Atom knowledge representation, a generalized
hypergraph formalism which comprises a specific vocabulary of Node and Link types, used to
represent declarative knowledge but also, to a lesser extent, other types of knowledge as well.
Other mechanisms of representing procedural, episodic, attentional, and intentional knowledge
will be handled in later chapters, as will the subtleties of implicit knowledge representation.
The AtomSpace Node and Link formalism is the most obviously distinctive aspect of the
OpenCog architecture, from the point of view of a software developer building AI processes in
the OpenCog framework. But yet, the features of CogPrime that are most important, in terms
of our theoretical reasons for estimating it likely to succeed as an advanced AGI system, are
not really dependent on the particulars of the AtomSpace representation.
What's important about the AtomSpace knowledge representation is mainly that it provides
a flexible means for compactly representing multiple forms of knowledge, in a way that allows
them to interoperate - where by "interoperate" we mean that e.g. a fragment of a chunk of
declarative knowledge can link to a fragment of a chunk of attentional or procedural knowledge;
or a chunk of knowledge in one category can overlap with a chunk of knowledge in another
category (as when the same link has both a (declarative) truth value and an (attentional)
importance value). In short, any representational infrastructure sufficiently flexible to support
• compact representation of all the key categories of knowledge playing dominant roles in
human memory
• the flexible creation of specialized sub-representations for various particular subtypes of
knowledge in all these categories, enabling compact and rapidly manipulable expression of
knowledge of these subtypes
• the overlap and interlinkage of knowledge of various types, including that represented using
specialized sub-representations
will probably be acceptable for CogPrime's purposes. However, precisely formulating these
general requirements is tricky, and is significantly more difficult than simply articulating a
single acceptable representational scheme, like the current OpenCog Atom formalism. The Atom
27
EFTA00624174
28
20 Knowledge Representation Using the Atomspace
formalism satisfies the relevant general requirements and has proved workable from a practical
software perspective.
In terms of the Mind-World Correspondence Principle introduced in Chapter 10, the impor-
tant point regarding the Atom representation is that it must be flexible enough to allow the
compact and rapidly manipulable representation of knowledge that has aspects spanning the
multiple common human knowledge categories, in a manner that allows easy implementation
of cognitive processes that will manifest the Mind-World Correspondence Principle in everyday
human-like situations. The actual manifestation of mind-world correspondence Ls the job of
the cognitive processes acting on the AtomSpace - the job of the AtomSpace is to be an effi-
cient and flexible enough representation that these cognitive processes can manifest mind-world
correspondence in everyday human contexts given highly limited computational resources.
20.2 Denoting Atoms
First we describe the textual notation we'll use to denote various sorts of Atoms throughout the
following chapters. The discussion will also serve to give some particular examples of cognitively
meaningful Atom constructs.
20.2.1 Meta-Language
As always occurs when discussing (even partially) logic-based systems, when discussing Cog-
Prime there is some potential for confusion between logical relationships inside the system, and
logical relationships being used to describe parts of the system. For instance, we can state as
observers that two Atoms inside CogPrime are equivalent, and this is different from stating
that CogPrime itself contains an Equivalence relation between these two Atoms. Our formal
notation needs to reflect this difference.
Since we will not be doing any fancy mathematical analyses of CogPrime structures or
dynamics here, there is no need to formally specify the logic being used for the metalanguage.
Standard predicate logic may be assumed.
So, for example, we will say things like
1lntensionallnheritanceLink Ben monster).TruthValue.strength is .5
This is a metalanguage statement. which means that the strength field of the TruthValue
object associated with the link (IntensionalInheritance Ben monster) is equal to .5. This is
different than saying
EquivalenceLink
ExOutLink
GetStrength
ExOutLink
GetTruthValue
IntensionalInheritanceLink Ben monster
NumberNode 0.5
EFTA00624175
20.2 Denoting Atoms
29
which refers to an equivalence relation represented inside CogPrime. The former refers to an
equals relationship observed by the authors of the book, but perhaps never represented explicitly
inside CogPrime.
In the first example above we have used the C++ convention
structure_variable_name.field_name
for denoting elements of composite structures; this convention will be stated formally below.
In the second example we have used schema corresponding to TruthValue and Strength; these
schema extract the appropriate fields from the Atoms they're applied to, so that e.g.
ExOutLink
GetTruthValue
A
returns the number
A.TruthValue
Following a convention from mathematical logic, we will also sometimes use the special
symbol
I -
to mean "implies in the metalanguage." For example, the first-order PLN deductive inference
strength rule may be written
InheritanceLink A B csAB>
InheritanceLink B C <BBC>
i-
InheritanceLink A C <sAC>
where
sAC
sAB sBC + (1-sAB) ( sC - se sBC ) / (1- se )
This is different from saying
ForAll SA, SB, SC, SsAB, Ss8C, SsAC
ExtensionalImplicationLink_HOJ
AND
InheritanceLink SA $8 <$sAE>
InheritanceLink SB SC <$sBC>
AND
InheritanceLink SA SC <$sAC>
$sAC
SsAB SsBC
(1-$2AB) ($2C - Sae $28C) / (1- SsB)
which is the most natural representation of the independence-based PLN deduction rule (for
strength-only truth values) as a logical statement within CogPrime. In the latter expression
the variables $A, $sAB, and so forth represent actual Variable Atoms within CogPrime. In
the former expression the variables represent concrete, non-Variable Atoms within CogPrime,
which however are being considered as variables within the metalanguage.
(As explained in the PLN book, a link labeled with "Hos" refers to a "higher order judgment",
meaning a relationship that interprets its relations as entities with particular truth values. For
instance,
ImplicationLink_HOJ
Inh SX stupid <.9>
Inh SX rich <.97
EFTA00624176
30
20 Knowledge Representation Using the Atomspace
means that if (Init $X stupid) has a strength of .9, then (Inh $X rich) has a strength of .9).
WIKISOURCE:AtomNotation
20.2.2 Denoting Atoms
Atoms are the basic objects making up CogPrime knowledge. They come in various types, and
are associated with various dynamics, which are embodied in MindAgents. Generally speaking
Atoms are endowed with TruthValue and AttentionValue objects. They also sometimes have
names, and other associated Values as previously discussed. In the following subsections we will
explain how these are notated, and then discuss specific notations for Links and Nodes, the two
types of Atoms in the system.
20.2.2.1 Names
In order to denote an Atom in discussion, we have to call it something. Relatedly but separately,
Atoms may also have names within the CogPrime system. (As a matter of implementation, in
the current OpenCog version, no Links have names; whereas, all Nodes have names, but some
Nodes have a null name, which is conceptually the same as not having a name.)
(name,type) pairs mast be considered as unique within each Unit within a OpenCog system,
otherwise they can't be used effectively to reference Atoms. It's OK if two different OpenCog
Units both have Schemallodes named "+", but not if one OpenCog Unit has two Schemallodes
both named "+" - this latter situation is disallowed on the software level, and is assumed in
discussions not to occur.
Sonic Atoms have natural names. For instance. the Schemallode corresponding to the ele-
mentary schema function + may quite naturally be named "+". The NumberNode corresponding
to the number .5 may naturally be named ".5", and the CharacterNode corresponding to the
character c may naturally be named "c". These cases are the minority, however. For instance,
a SpecificEntityNode representing a particular instance of + has no natural name, nor does a
SpecificEntityNode representing a particular instance of c.
Names should not be confused with Handles. Atoms have Handles, which are unique identi-
fiers (in practice, numbers) assigned to them by the OpenCog core system; and these Handles
are how Atoms are referenced internally, within OpenCog, nearly all the time. Accessing of
Atoms by name is a special case - not all Atoms have names, but all Atoms have Handles. An
example of accessing an Atom by name is looking up the CharacterNode representing the letter
"c" by its name "c". There would then be two possible representations for the word "cat":
1. this word might be associated with a ListLink - and the ListLink corresponding to "cat"
would be a list of the Handles of the Atoms of the nodes named "c", "ar, and "t".
2. for expedience, the word might be associated with a WordNode named "cat."
In the case where an Atom has multiple versions, this may happen for instance if the Atom
is considered in a different context (via a ContextLink), each version has a VersionHandle, so
that accesbing an AtomVersion requires specifying an AtomHandle plus a VersionHandle. See
Chapter 19 for more information on Handles.
EFTA00624177
20.2 Denoting Atoms
31
OpenCog never assigns Atoms names on its own; in fact, Atom names are assigned only in
the two sorts of cases just mentioned:
1. Via preprocessing of perceptual inputs (e.g. the names of NumberNode, CharacterNodes)
2. Via hard-wiring of names for Schemallodes and PredicateNodes corresponding to built-in
elementary schema (e.g. +, AND, Say)
If an Atom A has a name n in the system, we may write
A.name
n
On the other hand, if we want to assign an Atom an external name, we may make a meta-
language assertion such as
LI :o (InheritanceLink Ben animal)
indicating that we decided to name that link LI for our discussions, even though inside OpenCog
it has no name.
In denoting (nameless) Atoms we may use arbitrary manes like Ll. This is more convenient
than using a Handle based notation which Atoms would be referred to as 1, 3433322, etc.; but
sometimes we will use the Handle notation as well.
Some ConceptNodes and conceptual PredicateNode or Schemallodes may correspond with
human-language words or phrases like cat, bite, and so forth. This will be the minority case;
more such nodes will correspond to parts of human-language concepts or fuzzy collections of
human-language concepts. In discussions in this book, however, we will often invoke the unusual
case in which Atoms correspond to individual human-language concepts. This is because such
examples are the easiest ones to write about and discuss intuitively. The preponderance of
named Atoms in the examples in the book implies no similar preponderance of named Atoms
in the real OpenCog system. It is merely easier to talk about a hypothetical Atom named "cat"
than it is about a hypothetical Atom with Handle 434. It is not impossible that a OpenCog
system represents "cat" as a single ConceptNode, but it is just as likely that it will represent
"cat" as a map composed of many different nodes without any of these having natural names.
Each OpenCog works out for itself, implicitly, which concepts to represent as single Atoms and
which in distributed fashion.
For another example,
ListLink
CharacterNode "c"
CharacterNode "a"
CharacterNode "t"
corresponds to the character string
("c", "a", "e")
and would naturally be named using the string cat. In the system itself, however, this ListLink
need not have any name.
20.2.2.2 Types
Atoms also have types. When it is necessary to explicitly indicate the type of an atom, we will
use the keyword Type, as in
EFTA00624178
32
20 Knowledge Representation Using the Atomspace
A .Type
InheritanceLink
N_345.Type
ConceptNode
On the other hand, there is also a built-in schema Ha.sType which lets us say
EvaluationLink HasType A InheritanceLink
EvaluationLink HasType N_345 ConceptNode
This covers the case in which type evaluation occurs explicitly in the system, which is useful
if the system is analyzing its own emergent structures and dynamics.
Another option currently implemented in OpenCog is to explicitly restrict the type of a
variable using TypedVariableLink such as follows
TypedVariableLink
VariableNode $X
VariableTypeNode •ConceptNode•
Note also that we will frequently remove the suffix Link or Node from their type name,
such as
Inheritance
Concept A
Concept B
instead of
InheritanceLink
ConceptNode A
ConceptNode B
20.2.2.3 Truth Values
The truth value of an atom is a bundle of information describing how true the Atom is, in one
of several different senses depending on the Atom type. It is encased in a TruthValue object
associated with the Atom. Most of the time, we will denote the truth value of an atom in <>'s
following the expression denoting the atom. This very handy notation may be used in several
different ways.
A complication is that some Atoms may have CompositeTruthValues, which consist of differ-
ent estimates of their truth value made by different sources, which for whatever reason have not
been reconciled (maybe no process has gotten around to reconciling them, maybe they corre-
spond to different truth values in different contexts and thus logically need to remain separate,
maybe their reconciliation is being delayed pending accumulation of more evidence, etc.). In
this case we can still assume that an Atom has a default truth value, which corresponds to the
highest-confidence truth value that it has, in the Universal Context.
Most frequently, the notation is used with a single number in the brackets, e.g.
A <.4>
to indicate that the atom A has truth value .4: or
IntensionallnheritanceLink Ben monster <.5>
EFTA00624179
20.2 Denoting Atoms
33
to indicate that the Intensionallnheritance relation between Ben and monster has truth value
strength .5. In this case, <tv> indicates (roughly speaking) that the truth value of the atom
in question involves a probability distribution with a mean of tv. The precise semantics of the
strength values associated with OpenCog Atoms is described in Probabilistic Logic Networks
(see Chapter 34). Please note, though: This notation does not imply that the only data retained
in the system about the distribution is the single number .5.
If we want to refer to the truth value of an Atom A in the context C, we can use the construct
ContextLink <truth value>
C
Sometimes, Atoms in OpenCog are labeled with two truth value components as defined by
PLN: strength and weight-of-evidence. To denote these two components, we might write
IntensionalInheritanceLink Ben scary <.9,.l>
indicating that there is a relatively small amount of evidence in favor of the proposition that
Ben is very scary.
We may also put the TruthValue indicator in a different place, e.g. using indent notation,
IntensionallnheritanceLink <.9,.1>
Ben
scary
This is mostly useful when dealing with long and complicated constructions.
If we want to denote a composite truth value (whose components correspond to different
"versions" of the Atom), we can use a list notation, e.g.
Intensionallnheritance (<.9,.1>, <.5,.9> [h,1231,<.6,.7> (c,655))
Ben
scary
where e.g.
<.5,.9> (h,123)
denotes the TruthValue version of the Atom indexed by Handle 123. The h denotes that the
AtomVersion indicated by the VersionHandle h,123 is a Hypothetical Atom, in the sense de-
scribed in the PLN book. Some versions may not have any index Handles.
The semantics of composite TruthValues are described in the PLN book, but roughly they
are as follows. Any version not indexed by a VersionHandle is a "primary TruthValue" that
gives the truth value of the Atom based on some body of evidence. A version indexed by a
VersionHandle is either contextual or hypothetical, as indicated rotationally by the c or h in its
VersionHandle. So, for instance, if a TruthValue version for Atom A has VersionHandle h,123
that means it denotes the truth value of Atom A under the hypothetical context represented
by the Atom with handle 123. If a TruthValue version for Atom A has VersionHandle c,655
this means it denotes the truth value of Atom A in the context represented by the Atom with
Handle 655.
Alternately, truth values may be expressed sometimes in <L,U,b> or <L,U,b,N> format,
defined in terms of indefinite probability theory as defined in the PLN book and recalled in
Chapter 34. For instance,
IntensionallnheritanceLink Ben scary <.7,.9,.8,20>
EFTA00624180
34
20 Knowledge Representation Using the Atomapace
has the semantics that There is an estimated 80% chance that after 20 more observations have
been made, the estimated strength of the link will be in the interval (.7,.9).
The notation may also be used to specify a TruthValue probability distribution, e.g.
A <g(5,7,12)›
would indicate that the truth value of A is given by distribution g with parameters (5,7,12), or
A <M>
where M is a table of numbers, would indicate that the truth value of A is approximated by
the table M.
The <> notation for truth value is an unabashedly incomplete and ambiguous notation, but
it is very convenient. If we want to specify, say, that the truth value strength of IntensionalIn-
heritanceLink Ben monster is in fact the number .5, and no other truth value information is
retained in the system, then we need to say
(Intensionallnheritance Ben monster).TruthValue= [(strength, .5))
(where a hashtable form is assumed for TruthValue objects, i.e. a list of name-value pairs). But
this kind of issue will rarely arise here and the <> notation will serve us well.
20.2.2.4 Attention Values
The AttentionValue object associated with an Atom does not need to be notated nearly as often
as truth value. When it does however we can use similar notational methods.
AttentionValues may have several components, but the two critical ones are called short-
term importance (STI) and long-term importance (LTI). Furthermore, multiple STI values are
retained: for each (Atom, MindAgent) pair there may be a Mind-Agent-specific STI value for
that Atom. The pragmatic import of these values will become clear in a later chapter when we
discuss attention allocation.
Roughly speaking, the long-term importance is used to control memory usage: when memory
gets scarce, the atoms with the lowest LTI value are removed. On the other hand, the short-term
importance is used to control processor time allocation: MindAgents, when they decide which
Atoms to act on, will generally, but not only, choose the ones that have proved most useful to
them in the recent past, and additionally those that have been useful for other MindAgents in
the recent past.
We will use the double bracket <c>> to denote attention value (in the rare cases where
such denotation is necesbary). So, for instance,
Cow_7 <<.5»
will mean the node Cow_7 has an importance of .5; whereas,
Cow_7 <<STI,•.1, LTI - .8»
or simply
Cow_7 <<.1, .8>>
will mean the node Cow_7 has short-term importance = .1 and long-term importance = .8 .
Of course, we can also use the style
EFTA00624181
20.3 Representing Functions and Predicates
35
(IntensionalInheritanceLink Ben monster).AttentionValue
= [(STI,.1), (LTI, .8))
where appropriate.
20.2.2.5 Links
Links are represented using a simple notation that has already occurred many times in this
book. For instance,
Inheritance A
Similarity A B
Note that here the symmetry or otherwise of the link is not implicit in the notation. Similar-
ityLinks are symmetrical, InheritanceLinks are not. When this distinction is necessary, it will
be explicitly made. WIKISOURCE:FunctionNotation
20.3 Representing Functions and Predicates
Schemallodes and PredicateNodes contain functions internally; and Links may also usefully be
considered as functions. We now briefly discuss the representations and notations we will use
to indicate functions in various contexts.
Firstly, we will make some use of the currying notation drawn from combinatory, logic, in
which adjacency indicates function application. So, for instance, using currying,
f x
means the function f evaluated at the argument x; and (f x y) means (f(x))(y). If we want to
specify explicitly that a block of terminology is being specified using currying we will use the
notation ©Impression], for instance
ia[f x y z)
means
(Cf (WW1 (z)
We will also frequently use conventional notation to refer to functions, such as f(x,y). Of
course, this is consistent with the currying convention if (x,y) is interpreted as a list and f is
then a function that acts on 2-element lists. We will have many other occasions than this to
use list notation.
Also, we will sometimes use a non-curried notation, most commonly with Links, so that e.g.
InheritanceLink x y
does not mean a curried evaluation but rather means InheritanceLink(x,y).
EFTA00624182
36
20 Knowledge Representation Using the Atomspace
20.3.0.6 Execution Output Links
In the case where f refers to a schema, the occurrence of the combination f x in the system is
represented by
ExOutLink f x
or graphically
f
x
e
Note that, just as when we write
f
xl
we mean to apply f to the result of applying g to x, similarly when we write
ExOutLink f (ExOutLink g x)
we mean the same thing. So for instance
EvaluationLink (ExOutLink g x) y <.8>
means that the result of applying g to x is a predicate r, so that r(y) evaluates to 'Rue with
strength .8.
This approach, in its purest incarnation, does not allow multi-argument schemata. Now,
multi-argument schemata are never actually necessary, because one can use argument currying
to simulate multiple arguments. However, this is often awkward, and things become simpler if
one introduces an explicit tupling operator, which we call ListLink. Simply enough,
ListLink Al ...
An
denotes an ordered list (Al, ..., An)
20.3.1 Execution Links
ExecutionLinks give the system an easy way to record acts of schema execution. These are
ternary links of the form:
Schemallode: S
Atom: A,
ExecutionLink S A
In words, this says the procedure represented by Schemallode S has taken input A and
produced output B.
There may also be schemata that do not take output, or do not take input. But these are
treated as PredicateNodes, to be discussed below; their activity is recorded by EvaluationLinIcs,
not ExecutionLinks.
The TruthValue of an ExecutionLink records how frequently the result encoded in the Exe-
cutionLink occurs. Specifically,
EFTA00624183
20.3 Representing Functions and Predicates
37
• the TruthValue of (ExecutionLink S A B) tells you the probability of getting B as output,
given that you have run schema S on input A
• the TruthValue of (ExecutionLink S A) tells you the probability that if S is run, it is run
on input A
Often it is useful to record the time at which a given act of schema execution was carried out;
in that case one ruses the atTime link, writing e.g.
atTimeLink
ExecutionLink S A B
where T is a TimeNode, or else one uses an implicit method such as storing the time-stamp of
the ExecutionLink in a core-level data-structure called the TimeServer. The implicit method is
logically equivalent to explicitly using atTime, and is treated the same way by PLN inference,
but provides significant advantages in terms of memory usage and lookup speed.
For purposes of logically reasoning about schema, it is useful to create binary links repre-
senting ExecutionLinks with some of their arguments fixed. We name these as follows:
ExecutionLink) A B means: X so that ExecutionLink X A B
ExecutionLink2 A B means: X so that ExecutionLink A X B
ExecutionLink3 A B means: X so that ExecutionLink A B X
Finally, a Schemallode may be associated with a structure called a Graph.
Where S is a Schemallode,
Graph(S)
(x,y): ExecutionLink S x y )
Sometimes, the graph of a Schemallode may be explicitly embodied as a ConceptNode; other
times, it may be constructed implicitly by a MindAgent in analyzing the Schemallode (e.g. the
inference MindAgent).
Note that the set of ExecutionLinks describing a Schemallode may not define that SchemaN-
ode exactly, because some of them may be derived by inference. This means that the model of
a Schemallode contained in its ExecutionLinks may not actually be a mathematical function,
in the sense of assigning only one output to each input. One may have
ExecutionLink S X A c.5>
ExecutionLink S X El c.S>
meaning that the system does not know whether S(X) evaluates to A or to B. So the set of
ExecutionLinks modeling a Schemallode may constitute a non-function relation, even if the
schema inside the Schemallode is a function.
Finally, what of the case where f x represents the action of a built-in system function f on
an argument x? This is an awkward case that would not be necessary if the CogPrime system
were revised so that all cognitive functions were carried out using Schemallod . However,
in the current CogPrime version, where most cognitive functions are carried out using C++
MindAgent objects, if we want CogPrime to study its own cognitive behavior in a statistical
way, we need BuiltInSchemallodes that refer to MindAgents rather than to ComboTrees (or
else, we need to represent MindAgents using ComboTrees, which will become practicable once
we have a sufficiently efficient Combo interpreter). The semantics here is thus basically the
same as where f refers to a schema. For instance we might have
EFTA00624184
38
20 Knowledge Representation Using the Atomspace
ExecutionLink FirstOrderlnferenceMindAgent (LI, L2) L3
where LI, L2 and L3 are links related by
LI
L2
L3
according to the first-order PLN deduction rules.
20.3.1.1 Predicates
Predicates are related but not identical to schema, both conceptually and notationally. Predi-
cateNodes involve predicate schema which output TruthValue objects. But there is a difference
between a Schemallode embodying a predicate schema and a PredicateNode, which is that a
PredicateNode doesn't output a TruthValue, it adjusts its own TruthValue as a result of the
output of its own internal predicate schema.
The record of the activity of a PredicateNode is given not by an ExecutionLink but rather
by an:
EvaluationLink P A <tv>
where P is a PredicateNode, A is its input, and <tv> is the truth value assumed by the
EvaluationLink corresponding to the PredicateNode being fed the input A. There is also the
variant
EvaluationLink P <tv>
for the case where the PredicateNode P embodies a schema that takes no inputs'.
A simple example of a PredicateNode is the predicate GreaterThan. In this case we have,
for instance
EvaluationLink GreaterThan S 6 <0>
EvaluationLink GreaterThan S 3 <I>
and we also have:
EquivalenceLink
GreaterThan
ExOutLink
And
ListLink
ExOutLink
Not
LessThan
ExOutLink
Not
EqualTo
Note how the variables have been stripped out of the expression, see the PLN book for more
explanation about that. We will also encounter many commonsense-semantics predicates such
as isMale, with e.g.
actually, if P does take some inputs, EvaluationLink P <tv> is defined too and tv corresponds to the
average of P(X) over all inputs X, this is explained in more depth in the PLN book.
EFTA00624185
20.3 Representing Functions and Predicates
39
EvaluationLink isMale Sen_Goertzel <1>
Schemata that return no outputs are treated as predicates, and handled using Evaluation-
Links. The truth value of such a predicate, as a default, is considered as True if execution is
successful, and False otherwise.
And, analogously to the Graph operator for Schemallodes, we have for PredicateNodes the
SatisfyingSet operator, defined so that the SatisfyingSet of a predicate is the set whose members
are the elements that satisfy the predicate. Formally, that is:
S
SatisfyingSet P
means
Truthvalue(MemberLink X s)
equals
TruthValue(EvaluationLink P X)
This operator allows the system to carry out advanced logical operations like higher-order
inference and unification.
20.3.2 Denoting Schema and Predicate Variables
CogPrime sometimes uses variables to represent the expressions inside schemata and predicates,
and sometimes uses variable-free, combinatory-logic-based representations. There are two sorts
of variables in the system, either of which may exist either inside compound schema or predi-
cates, or else in the AtomSpace as VariableNodes:
It is important to distinguish between two sorts of variables that may exist in CogPrime:
• Variable Atoms, which may be quantified (bound to existential or universal quantifiers) or
unquantified
• Variables that are used solely as function-arguments or local variables inside the "Combo
tree" structures used inside some ProcedureNodes (PredicateNodes or Schemallodes) (to
be described below), but are not related to Variable Atoms
Examples of quantified variables represented by Variable Atoms are $X and $Y in:
ForAll SX <.0001>
ExtensionallmplicationLink
ExtensionallnheritanceLink $X human
ThereExists SY
AND
ExtensionallnheritanceLink SY human
EvaluationLink parent_of (SX, SY)
An example of an unquantified Variable Atom is $X in
ExtensionallmplicationLink <.3>
ExtensionallnheritanceLink $X human
ThereExists SY
AND
ExtensionallnheritanceLink $Y human
EvaluationLink parent_of (SX, SY)
EFTA00624186
40
20 Knowledge Representation Using the Atomspace
This ImplicationLink says that 30% of humans are parents: a more useful statement than
the ForAll Link given above, which says that it is very very unlikely to be true that all humans
are parents.
We may also say, for instance,
SatisfyingSet( EvaluationLink eats (cat, $X)
to refer to the set of X so that eats(cat, X).
On the other hand, suppose we have the implication
Implication
Evaluation f $X
Evaluation
f
ExOut reverse $X
where f is a PredicateNode embodying a mathematical operator acting on pairs of NumberN-
odes, and reverse is an operator that reverses a list. So, this implication says that the f predicate
is commutative. Now, suppose that f is grounded by the formula
f(a,b) a (a > b - 1)
embodied in a Combo Tree object (which is not commutative but that is not the point), stored
in the ProcedureRepository and linked to the PredicateNode for f. These f-internal
which are expressed here using the letters a and b, are not VariableNodes in the CogPrime
AtomTable. The notation we use for these within the textual Combo language, that goes with
the Combo Tree formalism, is to replace a and bin this example with #1 and #2, so the above
grounding would be denoted
f -> Cul > #2 - 1)
version, it is assumed that type restrictions are always crisp, not probabilistically truth-
valued. This assumption may be revisited in a later version of the system.
20.3.2.1 Links as Predicates
It. is conceptually important to recognize that CogPrime link types may be interpreted as
predicates. For instance, when one says
InheritanceLink cat animal <.8>
indicating an Inheritance relation between cat and animal with a strength .8, effectively one is
declaring that one has a predicate giving an output of .8. Depending on the interpretation of
InheritanceLink as a predicate, one has either the predicate
InheritanceLink cat $X
acting on the input
animal
or the predicate
InheritanceLink $X animal
acting on the input
EFTA00624187
20.3 Representing Functions and Predicates
41
cat
or the predicate
InheritanceLink SX SY
acting on the list input
(cat, animal)
This means that, if we wanted to, we could do away with all Link types except OrderedLink
and UnorderedLink, and represent all other Link types as PredicateNodes embodying appro-
priate predicate schema.
This is not the approach taken in the current codebase. However, the situation is somewhat
similar to that with CIM-Dynamics:
• In future we will likely create a revision of CogPrime that regularly revises its own vocabu-
lary, of Link types, in which case an explicit representation of link types as predicate schema
will be appropriate.
• In the shorter term, it can be useful to treat link types as virtual predicates, meaning that
one lets the system create Schemallodes corresponding to them, and hence do some meta
level reasoning about its own link types.
20.3.3 Variable and Combinator Notation
One of the most important aspects of combinatory logic, from a CogPrime perspective, is that
it allows one to represent arbitrarily complex procedures and patterns without using variables
in any direct sense. In CogPrime, variables are optional, and the choice of whether or how to
use them may be made (by CogPrime itself) on a contextual basis.
This section deals with the representation of variable expressions in a variable-free way, in a
CogPrime context. The general theory underlying this is well-known, and is usually expressed
in terms of the elimination of variables from lambda calculus expressions (lambda lifting). Here
we will not present this theory but will restrict ourselves to presenting a simple, hopefully
illustrative example, and then discussing some conceptual implications.
20.3.3.1 Why Eliminating Variables is So Useful
Before launching into the specifics, a few words about the general utility of variable-free ex-
pression may be worthwhile.
Some expressions look simpler to the trained human eye with variables, and some look simpler
without them. However, the main reason why eliminating all variables from an expression is
sometimes very useful, is that there are automated program-manipulation techniques that work
much more nicely on programs (schemata, in CogPrime lingo) without any variables in them.
As will he discussed later (e.g. Chapter 33 on evolutionary learning, although the same
process is also useful for supporting probabilistic reasoning on procedures), in order to mine
patterns among multiple schema that all try to do the same (or related) things, we want to put.
schema into a kind of "hierarchical normal form." The normal form we wish to use generalizes
EFTA00624188
42
20 Knowledge Representation Using the Atomspace
Holman's Elegant Normal Form (which is discussed in Moshe Looks' PhD thesis) to program
trees rather than just Boolean trees.
But, putting computer programs into a useful, nicely-hierarchically-structured normal form
is a hard problem - it requires one to have a pretty nice and comprehensive set of program
transformations.
But the only general, robust, systematic program transformation methods that exist in the
computer science literature require one to remove the variables from one's programs, so that
one can use the theory of functional programming (which ties in with the theory of monads in
category, theory, and a lot of beautiful related math).
In large part, we want to remove variables so we can use functional programming tools to
normalize programs into a standard and pretty hierarchical form, in order to mine patterns
among them effectively.
However, we don't always want to be rid of variables, because sometimes, from a logical
reasoning perspective, theorem-proving is easier with the variables in there. (Sometimes not.)
So, we want to have the option to use variables, or not.
20.3.3.2 An Example of Variable Elimination
Consider the PredicateNode
AND
InheritanceLink X cat
eats X mice
Here we have used a syntactically sugared representation involving the variable X. How can we
get rid of the X?
Recall the C combinator (from combinatory logic), defined by
Cfxy
fyx
Using this tool.
InheritanceLink X cat
becomes
C InheritanceLink cat X
and
eats X mice
becomes
C eats mice X
so that overall we have
AND
C InheritanceLink cat
C eats mice
where the C combinators essentially give instructions as to where the virtual argument X should
go.
In this case the variable-free representation is basically just as simple as the variable-based
representation, so there is nothing to lose and a lot to gain by getting rid of the variables. This
EFTA00624189
20.3 Representing Functions and Predicates
43
won't always be the case - sometimes execution efficiency will be significantly enhanced by use
of variables.
WIKISOURCE:TypeInheritance
20.3.4 Inheritance Between Higher-Order Types
Next, this section deals with the somewhat subtle matter of Inheritance between higher-order
types. This is needed, for example, when one wants to cross over or mutate two complex
schemata, in an evolutionary learning context. One encounters questions like: When mutation
replaces a schema that takes integer input, can it replace it with one that takes general numerical
input? How about vice versa? These questions get more complex when the inputs and outputs
of schema may themselves be schema with complex higher-order types. However, they can be
dealt with elegantly using some basic mathematical rules.
Denote the type of a mapping from type T to type S, as T -> S. Use the shorthand inh to
mean inherits from. Then the basic rule we use is that
Ti -> Si inh T2 -> S2
iff
T2 inh Ti
Si inh S2
In other words, we assume higher-order type inheritance is countervariant. The reason is
that, if RI = Ti -> Si is to be a special case of R2 = T2 -> 52, then one has to be able to
use the latter everywhere one uses the former. This means that any input R2 takes, has to also
be taken by RI (hence T2 inherits from Ti). And it means that the outputs R2 gives must be
able to be accepted by any function that accepts outputs of RI (hence Si inherits from 52).
This type of issue comes up in programming language design fairly frequently, and there are
a number of research papers debating the pros and cons of countervariance versus covariance for
complex type inheritance. However, for the purpose of schema type inheritance in CogPrime,
the greater logical consistency of the countervariance approach holds sway.
For instance, in this approach, INT -> INT is not a subtype of NO -> INT (where NO
denotes FLOAT), because NO -> INT is the type that includes all functions which take a real
and return an int, and an INT -> INT does not take a real. Rather, the containment is the
other way around: every NO -> INT function is an example of an INT -> INT function. For
example, consider the NO -> INT that takes every real number and rounds it up to the nearest
integer. Considered as an INT -> INT function, this is simply the identity function: it is the
function that takes an integer and rounds it up to the nearest integer.
Of course. sapling of types is different, it's covariant. If one has an ordered pair whose
elements are of different types, say (Ti, T2) , then we have
(TI , Si) inh (T2, S2)
if
TI inh T2
Si inh S2
As a mnemonic formula, we may say
EFTA00624190
44
20 KnowkdgeRepresentationUsingtheAtomspace
(general -> specific) inherits from (specific -> general)
(specific, specific) inherits from (general, general)
In schema learning, we will also have use for abstract type constructions, such as
(T1, T2) where T1 inherits
from T2
Notationally, we will refer to variable types as Xvl, Xv2, etc., and then denote the inheritance
relationships by using ntunerical indices, e.g. using
[1 inh 2]
to denote that
Xv1 inh Xv2
So for example,
(INT, VOID) inh (Xv1, Xv2)
is true, because there are no restrictions on the variable types, and we can just assign Xv1 =
INT, Xv2 = VOID.
On the other hand,
( INT, VOID ) inh ( Xvl, Xv2 ),
1 inh 2 ]
is false because the restriction Xvl inh Xv2 is imposed, but it's not true that INT inh VOID.
The following list gives some examples of type inheritance, using the elementary types INT,
FLOAT (FL), NUMBER (NO), CHAR and STRING (STR), with the elementary type inheri-
tance relationships
• INT inh NUMBER
• FLOAT inh NUMBER
• CHAR inh STRING
• ( NO -> FL ) inh ( INT -> FL )
• ( FL -> INT ) inh ( FL -> NO )
• ( ( INT -> FL ) -> ( FL -> INT ) ) inh ( ( NO -> FL ) -> ( FL -> NO ) ) WIK-
ISOURCE:AbstractSchemaManipulation
20.3.5 Advanced Schema Manipulation
Now we describe some special schema for manipulating schema, which seem to be very useful
in certain contexts.
20.3.5.1 Listification
First, there are two ways to represent n-ary relations in CogPrime's Atom level knowledge
representation language: using lists as in
f_list
(xl,
xn)
or using currying as in
EFTA00624191
20.3 Representing Functions and Predicates
45
f_curry xl
xn
To make conversion between list and curried forms easier, we have chosen to introduce special
schema (combinators) just for this purpose:
listify f - f_list so that f_list (xl,
xn )
f xl
xn
unlistify listify f - f
For instance
kick_curry Ben Ken
denotes
(kick_curry Ben) Ken
which means that kick is applied to the argument Ben to yield a predicate schema applied to
ICen. This is the curried style. The list style is
kick List (Ben, Ken)
where kick is viewed as taking as an argument the List (Ben, ICen). The conversion between
the two is done by
listify kick_curry
kick_list
unlistify kick_list - kick_curry
As a more detailed example of unlistification, let us utilize a simple mathematical example,
the function (X — 1)2. If we use the notations - and pow to denote Schemallodes embodying
the corresponding operations, then this formula may be written in variable-free node-and-link
form as
ExOutLink
pow
ListLink
ExOutLink
ListLink
1
2
But to get rid of the nasty variable X, we need to first unlistify the functions pow and -, and
then apply the C and B combinators a couple times to move the variable X to the front. The
B combinator (see Combinatory Logic REF) is recalled below:
Bfghof
(g h)
This is accomplished as follows (using the standard convention of left-associativity for the
application operator, denoted C.g) in the tree representation given in Section 20.3.0.6)
pow(-(x, 1), 2)
unlistify pow (-(x, 1) 2)
C (unlistify pow) 2 (-(x,1))
C (unlistify pow) 2 ((unlistify -) x 1)
C (unlistify pow) 2 (C (unlistify -) 1 x)
B (C (unlistify pow) 2) (C (unlistify -) 1) x
EFTA00624192
46
20 Knowledge Representation Using the Atomspace
yielding the final schema
(C (unlistify pow) 2) (C (unlistify -) 1)
By the way, a variable-free representation of this schema in CogPrime would look like
ExOutLink
ExOutLink
B
ExOutLink
ExOutLink
C
ExOutLink
unlistify
pow
2
ExOutLink
ExOutLink
C
ExOutLink
unlistify
1
The main thing to be observed is that the introduction of these extra schema lets us remove
the variable X. The size of the schema is increased slightly in this case, but only slightly - an
increase that is well - justified by the elimination of the many difficulties that explicit variables
would bring to the system. Furthermore, there is a shorter rendition which looks like
ExOutLink
ExOutLink
B
ExOutLink
ExOutLink
C
pow_curried
2
ExOutLink
ExOutLink
C
-_curried
1
This rendition uses alternate variants of - and pow schema, labeled -_cu rried
and
pow_curried,
which do not act on lists but are curried in the manner of combinatory logic
and Haskell. It is 13 lines whereas the variable-bearing version is 9 lines, a minor increase in
length that brings a lot of operational simplification.
20.3.5.2 Argument Permutation
In dealing with List relationships. there will sometimes be use for an argument-permutation
operator, let us call it P, defined as follows
(P p 0
(v1,
vn)
f (p (v1,
vn ))
where p is a permutation on n letters. This deals with the case where we want to say, for
instance that
EFTA00624193
20.3 Representing Functions and Predicates
47
Equivalence parent(x,y) child(y,x)
Instead of positing variable names x and y that span the two relations parent(x,y) and
child(y,x), what we can instead say in this example is
Equivalence parent (P (2,11 child)
For the case of two-argument functions, argument permutation is basically doing on the list
level what the C combinator does in the curried function domain. On the other hand, in the
case of n-argument functions with n>2, argument permutation doesn't correspond to any of
the standard combinators.
Finally, let's conclude with a similar example in a more standard predicate logic notation,
involving both combinators and the permutation argument operator introduced above. We will
translate the variable-laden predicate
likes(y,x)
AND likes ()cry)
into the equivalent combinatory logic tree. Let us first recall the combinator S whose function
is to distribute an argument over two terms.
sfgx••
(f x) (g x)
Assume that the two inputs are going to be given to us as a list. Now, the combinatory logic
representation of this is
S (B AND (B (P (2,1) likes))) likes
We now show how this would be evaluated to produce the correct expression:
S (B AND (B (P (2,1) likes))) likes (x,y)
S gets evaluated first, to produce
(B AND (B (P (2,11 likes)) (x,y)) (likes (x,y))
now the first B
AND f(B (P (2,1) likes)) (x,y)) (likes (x,y))
now the second one
AND ((P (2,11 likes) (x,y)) (likes (xl y))
now P
AND (likes
(y.x))
(likes
(x. IF) )
which is what we wanted.
EFTA00624194
Chapter 21
Representing Procedural Knowledge
21.1 Introduction
We now turn to CogPrime's representation and manipulation of procedural knowledge. In a
sense this is the most fundamental kind of knowledge - since intelligence is most directly about
action selection, and it is procedures which generate actions.
CogPrime involves multiple representations for procedures, including procedure maps and
(for sensorimotor procedures) neural nets or similar structures. Its most basic procedural knowl-
edge representation, however, is the program. The choice to use programs to represent proce-
dures was made after considerable reflection — they are not of course the only choice, as other
representations such as recurrent neural networks possess identical representational power, and
are preferable in some regards (e.g. resilience with respect to damage). Ultimately, however,
we chase programs due to their consilience with the software and hardware underlying Cog-
Prime (and every other current AI program). CogPrime is a program, current computers and
operating systems are optimized for executing and manipulating programs; and we humans
now have many tools for formally and informally analyzing and reasoning about programs. The
human brain probably doesn't represent most procedures as programs in any simple sense, but
CogPrime is not intended to be an emulation of the human brain. So, the representation of
programs as procedures is one major case where CogPrime deviates from the human cogni-
tive architecture in the interest of more effectively exploiting its own hardware and software
infrastructure.
CogPrime represents procedures as programs in an internal programming language called
"Combo." While Combo has a textual representation, described online at the OpenCog wiki,
this isn't one of its more important aspects (and may be redesigned slightly or wholly without
affecting system intelligence or architecture); the essence of Combo programs lies in their tree
representation not their text representation. One could fairly consider Combo as a dialect of
LISP, although it's not equivalent to any standard dialect, and it hasn't particularly been
developed with this in mind. In this chapter we discuss the key concepts underlying the Combo
approach to program representation, seeking to make clear at each step the motivations for
doing things in the manner proposed.
In terms of the overall CogPrime architecture diagram given in Chapter 6 of Part 1, this
chapter is about the box labeled "Procedure Repository." The latter, in OpenCog, is a special-
ized component connected to the AtomSpace, storing Combo tree representations of programs;
49
EFTA00624196
50
21 Representing Procedural Knowledge
each program in the repository is linked to a Schemallode in the AtomSpace, ensuring full
connectivity between procedural and declarative knowledge.
21.2 Representing Programs
What is a "program" anyway? What distinguishes a program front an arbitrary representation
of a procedure?
The essence of programmatic representations is that they are well-specified, compact, com-
binatorial, and hierarchical:
• Well-specified: unlike sentences in natural language, programs are unambiguous; two distinct
programs can be precisely equivalent.
• Compact: programs allow us to compress data on the basis of their regularities. Accordingly,
for the purposes of this chapter, we do not consider overly constrained representations such
as the well-known conjunctive and disjunctive normal forms for Boolean formulae to be
programmatic. Although they can express any Boolean function (data), they dramatically
limit the range of data that can be expressed compactly, compared to unrestricted Boolean
formulae.
• Combinatorial: programs access the results of running other programs (e.g. via function
application), as well as delete, duplicate, and rearrange these results (e.g., via variables or
combinators).
• Hierarchical: programs have intrinsic hierarchical organization, and may be decomposed
into subprograms.
Eric Baum has advanced a theory "under which one understands a problem when one has
mental programs that can solve it and many naturally occurring variations" lBatiOtil. In this
perspective - which we find an agreeable way to think about procedural knowledge, though
perhaps an overly limited perspective on mind as a whole - one of the primary goals of ar-
tificial general intelligence is systems that can represent, learn, and reason about such pro-
grams [Bau06, B=01]. Furthermore, integrative AGI systems such as CogPrime may contain
subsystems operating on programmatic representations. Would-be AGI systems with no direct
support for programmatic representation will clearly need to represent procedures and proce-
dural abstractions somehow. Alternatives such as recurrent neural networks have serious down-
sides, including opacity and inefficiency, but also have their advantages (e.g. recurrent neural
nets can be robust with regard to damage, and learnable via biologically plausible algorithms).
Note that the problem of how to represent programs for an AGI system dissolves in the
unrealistic case of unbounded computational resources. The solution is algorithmic information
theory rha081, extended recently to the case of sequential decision theory Illut05al. The latter
work defines the universal algorithmic agent AIXI, which in effect simulates all possible pro-
grams that are in agreement with the agent's set of observations. While AIXI is =computable,
the related agent AIM" may be computed, and is superior to any other agent bounded by
time t and space 1 II Int0514. The choice of a representational language for programs' is of no
consequence, as it will merely introduce a bias that will disappear within a constant number of
time steps.2
I As well as a language for proofs in the case of AIXIti.
2 The universal distribution converges quickly.
EFTA00624197
21.3 Representational Challenges
51
Our goal in this chapter Ls to provide practical techniques for approximating the ideal pro-
vided by algorithmic probability, based on what Pei Wang has termed the assumption of in-
sufficient knowledge and resources tWan061, and assuming an AGI architecture that's at least
vaguely humanlike in nature, and operates largely in everyday human environments, but uses
programs to represent many procedures. Given these assumptions, how programs are repre-
sented is of paramount importance. as we shall see in the next two sections, where we give a
conceptual fornmlation of what we mean by tractable program representations, and introduce
tools for formalizing such representations. The fourth section delves into effective techniques
for representing programs. A key concept throughout is syntactic-semantic correlation, meaning
that programs which are similar on the syntactic level, within certain constraints will tend to
also be similar in terms of their behavior (i.e. on the semantic level). Lastly, the fifth section
changes direction a bit and discusses the translation of programmatic structure into declarative
form for the purpases of logical inference.
In the future, we will experimentally validate that these normal forms and heuristic trans-
formations do in fact increase the syntactic-semantic correlation in program spaces, as has
been shown so far only in the Boolean case. We would also like to explore the extent to which
even stronger correlation, and additional tractability properties, can be observed when realistic
probabilistic constraints on "natural" environment and task spaces are imposed.
The importance of a good programmatic representation of procedural knowledge becomes
quite clear when one thinks about it in terms of the Mind-World Correspondence Principle
introduced in Chapter 10. That principle states, roughly, that transition paths between world-
states should map naturally onto transition paths between mind-states. This suggests that there
should be a natural, smooth mapping between red-world action series and the corresponding
series of internal states. Where internal states are driven by explicitly given programs, this
means that the transitions between internal program states should nicely mirror transitions
between the states of the real world as it interacts with the system controlled by the program.
The extent to which this is true will depend on the specifics of the programming language - and
it will be true for a much greater extent, on the whole, if the programming language displays
high syntactic-semantic correlation for behaviors that commonly occur when the program is
used to control the system in the real world. So, the various technical issues mentioned above
and considered below, regarding the qualities desired in a programmatic representation, are
merely the manifestation of the general Mind-World Correspondence Principle in the context of
procedural knowledge, under the assumption that procedures are represented as programs. The
material in this chapter may be viewed as an approach to ensuring the validity of the Mind-
World Correspondence principle for programmatically-represented procedural knowledge, for
CogPrime systems concerned with achieving humanly meaningful goals in everyday human
environments.
21.3 Representational Challenges
Despite the advantages outlined in the previous section, there are a number of challenges in
working with programmatic representations:
• Open-endedness - in contrast to some other knowledge representations current in machine
learning, programs vary in size and "shape", and there is no obvious problem-independent
EFTA00624198
52
21 Representing Procedural Knowledge
upper bound on program size. This makes it difficult to represent programs as points in a
fixed-dimensional space, or to learn programs with algorithms that assume such a space.
• Over-representation - often, syntactically distinct programs will be semantically iden-
tical (i.e. represent the same underlying behavior or functional mapping). Lacking prior
knowledge, many algorithms will inefficiently sample semantically identical programs re-
peatedly [Loo07a, GIW.01h
• Chaotic Execution - programs that are very similar, syntactically, may be very different,
semantically. This presents difficulties for many heuristic search algorithms, which require
syntactic and semantic distance to be correlated Ilmo07b, TVCC05].
• High resource-variance - programs in the same space vary greatly in the space and time
they require to execute.
It's easy to see how the latter two issues may present a challenge for mind-world correspon-
dence! Chaotic execution makes it hard to predict whether a program will indeed manifest
state-sequences mapping nicely to a corresponding world-sequences: and high resource-variance
makes it hard to predict whether, for a given program, this sort of mapping can be achieved
for relevant goals given available resources.
Based on these concerns, it is no surprise that search over program spaces quickly succumbs
to combinatorial explosion, and that heuristic search methods are sometimes no better than
random sampling II.P02j. However, alternative representations of procedures also have their
difficulties, and so far we feel the thornier aspects of programmatic representation are generally
an acceptable price to pay in light of the advantages.
For some special cases in CogPrime we have made a different choice - e.g. when we use
DeSTIN for sensory perception (see Chapter 28 we utilize a more specialized representation
comprising a hierarchical network of more specialized elements. DeSTIN doesn't have problems
with resource variance or chaotic execution, though it does stiffer from over-representation. It
is not very open-ended, which helps increase its efficiency in the perceptual processing domain,
but may limit its applicability to more abstract cognition. In short we feel that, for general
representation of cognitive procedures, the benefits of programmatic representation outweigh
the casts; but for some special cases such as low-level perception and motor procedures, this
may not be true and one may do better to opt for a more specialized, more rigid but less
problematic representation.
It would be possible to modify CogPrime to use, say, recurrent neural nets for procedure
representation, rather than programs in an explicit language. However, this would rate as a
rather major change in the architecture, and would cause multiple problems in other aspects
of the system. For example, programs are reasonably straightforward to reason about using
PLN inference, whereas reasoning about the internals of recurrent neural nets is drastically
more problematic, though not impassible. The choice of a procedure representation approach
for CogPrime has been made considering not only procedural knowledge in itself, but the
interaction of procedural knowledge with other sorts of knowledge. This reflects the general
synergetic nature of the CogPrime design.
There are also various computation-theoretic issues regarding programs; however, we suspect
these are not particularly relevant to the task of creating human-level AGI, though they may
rear their heads when one gets into the domain of super-human, profoundly self-modifying AGI
systems. For instance, in the context of the the difficulties caused by over-representation and
high resource-variance, one might observe that determinations of e.g. programmatic equivalence
for the former, and e.g. halting behavior for the latter, are uncomputable. But we feel that,
given the assumption of insufficient knowledge and resources, these concerns dissolve into the
EFTA00624199
21.4 What Makes a Representation Tractable?
53
larger issue of computational intractability and the need for efficient heuristics. Determining the
equivalence of two Boolean formulae over 500 variables by computing and comparing their truth
tables is trivial from a computability standpoint, but, in the words of Leonid Levin, "only math
nerds would call 2500 finite" ILev9-1]. Similarly, a program that never terminates is a special
case of a program that runs too slowly to be of interest to us.
One of the key ideas underlying our treatment of programmatic knowledge is that, in order
to tractably learn and reason about programs, an Al system must have prior knowledge of
programming language semantics. That is, in the approach we advocate, the mechanism whereby
programs are executed is assumed known a priori, and assumed to remain constant across many
problems. One may then craft AI methods that make specific use of the programming language
semantics, in various ways. Of course in the long run a sufficiently powerful AGI system could
modify these aspects of its procedural knowledge representation; but in that case, according
to our approach, it would also need to modify various aspects of its procedure learning and
reasoning code accordingly.
Specifically, we propose to exploit prior knowledge about program structure via enforcing
programs to be represented in normal forms that preserve their hierarchical structure, and to
be heuristically simplified based on reduction rules. Accordingly, one formally equivalent pro-
gramming language may be preferred over another by virtue of making these reductions and
transformations more explicit and concise to describe and to implement. The current OpenCog-
Prime system uses a simple LISP-like language called Combo (which takes both tree form and
textual form) to represent procedures, but this is not critical; the main point is using some
language or language variant that is "tractable" in the sense of providing a context in which
the semantically useful reductions and transformations we've identified are naturally expressible
and easily usable.
21.4 What Makes a Representation Tractable?
Creating a comprehensive formalization of the notion of a tractable program representation
would constitute a significant achievement; and we will not answer that summons here. We will,
however, take a step in that direction by enunciating a set of positive principles for tractable
program representations, corresponding closely to the list of representational challenges above.
While the discussion in this section is essentially conceptual rather than formal, we will use a
bit of notation to ensure clarity of expression; S to denote a space of programmatic functions
of the same type (e.g. all pure Lisp A-expressions mapping from lists to numbers), and B to
denote a metric space of behaviors.
In the case of a deterministic, side-effect-free program, execution maps from programs in $ to
points in B, which will have separate dimensions for the function's output across various inputs
of interest, as well as dimensions corresponding to the time and space costs of executing the
program. In the case of a program that interacts with an external environment, or is intrinsically
nondeterministic, execution will map from $ to probability distributions over points in B, which
will contain additional dimensions for any side-effects of interest that programs in $ might have.
Note the distinction between syntactic distance, measured as e.g. tree-edit distance between
programs in 5, and semantic distance, measured between program's corresponding points in or
probability distributions over B. We assume that semantic distance accurately quantifies our
EFTA00624200
54
21 Representing Procedural Knowledge
preferences in terms of a weighting on the dimensions of B; i.e., if variation along some axis is
of great interest, our metric for semantic distance should reflect this.
Let P be a probability distribution over B that describes our knowledge of what sorts of
problems we expect to encounter, let R(n) C S be all the programs in our representation with
(syntactic) size no greater than n. We will say that R(n) d-covers the pair (B,P) to extent
p if the probability that, for a random behavior b E B chosen according to 12, there is some
program in R whose behavior is within semantic distance d of 6, is greater than or equal to
p. Then, some among the various properties of tractability that seem important based on the
above discussion are as follows:
• for fixed d, p quickly goes to I as n increases,
• for fixed p, d quickly goes to 0 as n increases,
• for fixed d and p, the minimal n needed for R(n) to d-cover (B, P) to extent p should be
as small as possible,
• ceteris paribus, syntactic and semantic distance (measured according to P) are highly cor-
related.
This is closely related to the Mind-Brain Correspondence Principle articulated in Chapter
10, and to the geometric formulation of cognitive synergy posited in Appendix ??. Syntactic
distance has to do with distance along paths in mind-space related to formal program structures,
and semantic distance has to do with distance along paths in mind-space and world-space
corresponding to the record of the program's actual behavior. If syntax-semantics correlation
failed, then there would be paths through mind-space (related to formal program structures)
that were poorly matched to their closest corresponding paths through the rest of mind-space
and world-space, hence causing a failure (or significant diminution) of cognitive synergy and
mind-world correspondence.
Since execution time and memory usage considerations may be incorporated into the defini-
tion of program behavior, minimizing chaotic execution and managing resource variance emerges
conceptually here as subcases of maximizing correlation between syntactic and semantic dis-
tance. Minimizing over-representation follows from the desire for small program size: roughly
speaking the less over-representation there is, the smaller average program size can be achieved.
In some cases one can achieve fairly strong results about tractability of representations
without any special assumptions about P: for example in prior work we have shown that
adoption of an appropriate hierarchical normal form can generically increase correlation between
syntactic and semantic distance in the space of Boolean functions I?, Loonbl. In this case we
may say that we have a generically tractable representation. However, to achieve tractable
representation of more complex programs, some fairly strong assumptions about P will be
necessary. This should not be philosophically disturbing, since it's clear that human intelligence
has evolved in a manner strongly conditioned by certain classes of environments; and similarly,
what we need to do to create a viable program representation system for pragmatic AGI usage,
is to achieve tractability relative to the distribution P corresponding to the actual problems
the AGI is going to need to solve. Formalizing the distributions P of real-world interest is
a difficult problem, and one we will not address here (recall the related, informal discussions
of Chapter 9, where we considered the various important peculiarities of the human everyday
world). However, we hypothesize that the representations presented in the following section may
be tractable to a significant extent irrespective of P,3 and even more powerfully tractable with
3 Specifically, with only weak biases that prefer smaller and faster programs with hierarchical decompositions.
EFTA00624201
21.6 Normal Forms Postulated to Provide Tractable Representations
55
respect to this as-yet unfonnalized distribution. As weak evidence in favor of this hypothesis,
we note that many of the representations presented have proved useful so far in various narrow
problem-solving situations.
21.5 The Combo Language
The current version of OpenCogPrime uses a simple language called Combo, which is an example
of a language in which the transformations we consider important for AGI-focused program
representation are relatively simple and natural. Here we illustrate the Combo language by
example, referring the reader to the OpenCog wiki site for a fonnal presentation.
The main use of the Combo language in OpenCog is behind-the-scenes, i.e. using tree repre-
sentations of Combo programs: but there is also a human-readable syntax, and an interpreter
that allows humans to write Combo programs when needed. The main use of Combo, however,
is not for human-coded programs, but rather for programs that are learned via various AI
methods.
In Combo all expressions are in prefix form like LISP, but the left parenthesis is placed after
the operator instead of before, for example:
• +(4 5)
is a 0-ari expression that returns 4 t 5
• and (41 O<(42))
is a binary expression of type boot x float
boot that returns true if and only if the first
input is true and the second input positive. #n designates the n-th input.
• fact (1)
if (0<(41) *(41. fact (+(41 - 1))) 1)
is a recursive definition of factorial.
• and_seg (goto (stick) grab(stick) goto (owner) drop)
is a 0-ari expression with side effects, it evaluates a sequence of actions until completion or
failure of one of them. Each action is executed in the environment the agent is connected to
and returns action_success upon success or action_ failure otherwise. The action sequence
returns action_success if it completes or action_ failure if it does not.
• if (near (owner self)
lick (owner)
and_seg (goto(owner) wag)
is a 0-ary expression with side effects; it means that if at the time of its evaluation the agent
referred as self (here a virtual pet) is near its owner then lick him/her, otherwise go to the
owner and wag the tail.
21.6 Normal Forms Postulated to Provide Tractable Representations
We now present a series of normal forms for programs, postulated to provide tractable repre-
sentations in the contexts relevant to human-level, roughly human-like general intelligence.
EFTA00624202
56
21 Representing Procedural Knowledge
21.6.1 A Simple Type System
We use a simple type system to distinguish between the various normal forms introduced below.
This is necessary to convey the minimal information needed to correctly apply the basic func-
tions in our canonical forms. Various systems and applications may of course augment these
with additional type information, up to and including the satisfaction of arbitrary predicates
(e.g. a type for prime numbers). This can be overlaid on top of our minimalist system to con-
vey additional bias in selecting which transformations to apply, and introducing constraints as
necessary. For instance, a call to a function expecting a prime number, called with a potentially
composite argument, may be wrapped in a conditional testing the argument's primality. A sim-
ilar technique is used in the normal form for functions to deal with list arguments that may be
empty.
Normal forms are provided for Boolean and number primitive types, and the following
parametrized types:
• list types, tistr, where T is any type,
• tuple types, tupleThrz_m, where all Z are types, and N is a positive natural number,
• enum types, {s1,s2,... sN}, where N is a positive number and all se are unique identifiers,
• function types T1, T2, ... TN -, 0, where 0 and all 711 are types,
• action result types.
A list of type tistr is an ordered sequence of any number of elements, all of which mast have
type T. A tuple of type tuple2),T2....2), is an ordered sequence of exactly N elements, where
every ith element is of type
An cm= of type (s1, 82,
sN) is some element si from the set.
Action result types concern side-effectful interaction with some world external to the system
(but perhaps simulated, of course), and will be described in detail in their subsection below.
Other types may certainly be added at a later date, but we believe that those listed above
provide sufficient expressive power to conveniently encompass a wide range of programs, and
serve as a compelling proof of concept.
The normal form for a type T is a set of elementary functions with codomain T, a set
of constants of type T, and a tree grammar. Internal nodes for expressions described by the
grammar are elementary functions, and leaves are either U„„r or ticonstani, where U is some
type (often U = T).
Sentences in a normal form grammar may be transformed into normal form expressions. The
set of expressions that may be generated is a function of a set of bound variables and a set
of external functions that must be provided (both bound variables and external functions are
typed). The transformation is as follows:
• Tc.„,i,„„t leaves are replaced with constants of type T,
• T„ar leaves are replaced with either bound variables matching type T, or expressions of the
form f (expri,expr2,... expr,w), where f is an external function of type 711, T2,
Tim
T,
and each exp.; is a normal form expression of type Z (given the available bound variables
and external functions).
EFTA00624203
21.6 Normal Forms Postulated to Provide Tractable Representations
57
21.6.2 Boolean Normal Form
The elementary functions are and, or, and not. The constants are {true, false}. The grammar
is:
bool_root
or_form
I and_form I literal
I bool_constant
literal
- bool_var I not( bool_var )
or_form
- or( (and_form I literalf(2,1 )
and_form
and( (or_form I literalf(2,1 )
The construct foo(x,) refers to x or more matches of foo (e.g. ( x
I 10(2,1 is two or more
items in sequences where each item is either an x or a y).
21.6.3 Number Normal Form
The elementary functions are * (times) and t (plus). The constants are some subset of the
rationals (e.g. those with IEEE single-precision floating-point representations). The grammar
is:
num_root
times_form I plus_form I num_constant I num_var
times_form
*( (num_constant I plus_form) plus_form(1,) )
I num_var
plus_form
+( (num_constant I times form} times_form(1,1 )
I num_var
21.6.4 List Normal Form
For list types /istr, the elementary functions are list (an n-ary list constructor) and append.
The only constant is the empty list (nil). The grammar is:
list_T_root
append_form I list_form I list_T_var
I list_T_constant
append_form - append( (list_form I list_T_var)(2,) I
list_form
- list( T_root(1,) )
21.6.5 Tuple Normal Form
For tuple types tapleT,,,r,,„,n ,, the only elementary function is the tuple constructor (tuple).
The constants are
Ti_constant xT2_constant x • • • x TN_constant
The normal form is either a constant, a var, or
tuple( Ti_root Ts_root
TN_root )
EFTA00624204
58
21 Representing Procedural Knowledge
21.6.6 Enum. Normal Form
Emun.s are atomic tokens with no internal structure - accordingly, there are no elementary
functions. The constants for the enum (Si. .92
91.11 are the as. The normal form is either a
constant or a variable.
21.6.7 Function Normal Form
For T1, T2, ... TN
0. the normal form is a lambda-expression of arity N whose body is of
type O. The list of variable names for the lambda-expression Ls not a "proper" argument - it
does not have a normal form of its own. Assuming that none of the Tis is a list type, the body
of the lambda-expression is simply in the normal form for type O (with the possibility of the
lambda-expressions arguments appearing with their appropriate types). If one or more Tis are
list types, then the body is a call to the split function with all arguments in normal form.
Split is a family of functions with type signatures
(71,1iStri ,T2,
Tk, iiSirk -, 0),
tupletistr,,o,tuplelistr2,0, • • • tupletistr„,O -> 0.
To evaluate split( f i tuple(li,o1),tuple(I2,O2),
tuple(lk,ok)), the list arguments 11,12,...1k
are examined sequentially. If some
is found that is empty, then the result is the corresponding
value
If all l; are nonempty, we deconstruct each of them into xi : xsi, where xi is the first
element of the list and xsi is the rest. The result is then f(xi, xst, x2, xs2, . xk,xsk). The split
function thus acts as an implicit case statement to deconstruct lists only if they are nonempty.
21.6.8 Action Result Normal Form
An action result type act corresponds to the result of taking an action in some world. Every,
action result type has a corresponding world type, world. Associated with action results and
worlds are two special sorts of functions.
• Perceptions - functions that take a world as their first argument and regular (non-world
and non-action-result) types as their remaining arguments, and return regular types. Unlike
other function types, the result of evaluating a perception call may be different at different
times, because the world will have different configurations at different times.
• Actions - functions that take a world as their first argument and regular types as their
remaining arguments, and return action results (of the type associated with the type of
their world argument). As with perceptions, the result of evaluating an action call may be
different at different times. Furthermore, actions may have side effects in the associated
world that they are called in. Thus, unlike any other sort of function, actions must be
evaluated, even if their return values are ignored.
Other sorts of functions acting on worlds (e.g. ones that take multiple worlds as arguments) are
disallowed.
EFTA00624205
21.7 Program Transformations
59
Note that an action result expression cannot appear nested inside an expression of any other
type. Consequently, there is no way to convert e.g. an action result to a Boolean, although con-
version in the opposite direction is permitted. This is required because mathematical operations
in our language have classical mathematical semantics; x and y must equal y and x, which will
not generally be the case if x or y can have side-effects. Instead, there are special sequential
versions of logical functions which may be used instead.
The elementary functions for action result types are andacq (sequential and, equivalent to
C's short-circuiting &&), arscl (sequential or, equivalent to C's short-circuiting I I ), and fails
(negates success to failure and vice versa). The constants may vary from type to type but must
at least contain success and failure, indicating absolute success/failure in execution.4 The
normal form is as follows:
act_root
seqlit
act
orseq_form
andseq_form
orseq_form I andseq_form I seqlit
act
I fails(
act )
act constant
I act_var
orseq( (andseq_form I secrliti(2,)
)
andseq( (orseq_form I secrliti(2,)
)
21.7 Program Transformations
A program transformation is any type-preserving mapping from expressions to expressions.
Transformations may be guaranteed to preserve semantics. When doing program evolution there
is an intermediate category of fitness preserving transformations that may alter semantics, but
not fitness. In general, the only way that fitness preserving transformations will be uncovered is
by scoring programs that have had their semantics potentially transformed to determine their
fitness, which is what most fitness function does. On the other hand if the fitness function is
encompassed in the program itself, so a candidate directly outputs the fitness itself, then only
preserving semantics transformations are needed.
21.7.1 Reductions
These are semantics preserving transformations that do not increase some size measure (typ-
ically number of symbols), and are idempotent. For example, and(x, x, y)
and(x, y) is a
reduction for Boolean expressions. A set of canonical reductions is defined for every type that
has a normal form. For numerical functions, the simplifier in a computer algebra system may
be used. The full list of reductions is omitted in for brevity. An expression is reduced if it maps
to itself under all canonical reductions for its type, and all of its children are reduced.
Another important set of reductions are the compressive abstractions, which reduce or keep
constant the size of expressions by introducing new functions. Consider
list is (Ha p q) r)
*(+(b p cil r)
it(+(c p q) r))
A do(arghar92,...argN) statement (known as progn in Lisp), which evaluates its arguments sequen-
tially regardless of success or failure, is equivalent to and,,cq(oracq(argi, success). or seq(arst 2, success). ...
orstq(arg iv , success)).
EFTA00624206
60
21 Representing Procedural Knowledge
which contains 19 symbols. Transforniing this to
f(x)
w(+(x p q) r)
list (f (a) f(b) f(el)
reduces the total number of symbols to 15. One can generalize this notion to consider com-
pressive abstractions across a set of programs. Compressive abstractions appear to be rather
expensive to uncover, although not prohibitively so, the computation may easily be parallelized
and may rely heavily on subtree mining !TODD REF].
21.7.1.1 A Simple Example of Reduction
We now give a simple example of how CogPrhne's reduction engine can transform a program
into a semantically equivalent but shorter one.
Consider the following program and the chain of reduction:
1. We start with the expression
if(P and_seq (if (P A B) B) and_seq(A B))
2. A reduction rule permits to reduce the conditional if IP A B) to if (true A B). Indeed if
P is true, then the first branch is evaluated and P must still be true.
if(P and_seq (if (true A B) B) and_seq (A B) I
3. Then a rule can reduce if (true A B) to A.
if(P and_seq (A B) and_seq (A B))
4. And finally another rule replaces the conditional by one of its branches since they are
identical
and_seq (A B)
Note that the reduced program is not only smaller (3 symbols instead of 11) but a bit faster
too. Of course it is not generally true that smaller programs are faster but in the restricted
context of our experiments it has often been the case.
21.7.2 Neutral Transformations
Semantics preserving transformations that are not reductions are not useful on their own - they
can only have value when followed by transformations from some other class. They are thus
more speculative than reductions, and more costly to consider. I will refer to these as neutral
transformations 1O1s951.
• Abstraction - given an expression E containing non-overlapping subexpressions Et, E2,
... EN, let E' be E with all E1 replaced by the unbound variables
Define the function
f (vi, v2, ... vs) = E', and replace E with f(Ei, E2,... EN). Abstraction is distinct from
compressive abstraction because only a single call to the new function f is introduced.'
• Inverse abstraction - replace a call to a user-defined function with the body of the
function, with arguments instantiated (note that this can also be used to partially invert a
compressive abstraction).
6 In compressive abstraction there must be at least two calls in order to avoid increasing the number of symbols.
EFTA00624207
21.7 Program 'I'ransformations
61
• Distribution - let E be a call to some function f, and let E' be an expression of E's ith
argument that is a call to some function g, such that f is distributive over g's arguments,
or a subset thereof. We shall refer to the actual arguments to g in these positions in E' as
x1, x2,... xn. Now, let D(F) be the function that is obtained by evaluating E with its ith
argument (the one containing E') replaced with the expression F. Distribution is replacing
E with E', and then replacing each xj (1 ≤ j ≤ n) with D(x3). For example, consider
+(x *(y if(cond a b)))
Since both + and * are distributive over the result branches of if, there are two possible
distribution transformations, giving the expressions
if(cond +(x *(y a)) +(x *CY b)))
+ (x
(cond *(y a) *ly b) ) )
• Inverse distribution (factorization) - the opposite of distribution. This is nearly a
reduction; the exceptions are expressions such as f(g(x)), where f and g are mutually
distributive.
• Arity broadening - given a function f, modify it to take an additional argument of some
type. All calls to f must be correspondingly broadened to pass it an additional argument
of the appropriate type.
• List broadening - given a function f with some ith argument x of type T, modify f to
instead take an argument y of type list?, which gets split into x : xs. All calls to f with
ith argument x' must be replaced by corresponding calls with ith argument list(e).
• Conditional insertion - an expression x is replaced by if(true,x,y), where y is some
expression of the same type of x.
As a technical note, action result expressions (which may cause side-effects) complicate neu-
tral transformations. Specifically, abstractions and compressive abstractions must take their
arguments lazily (i.e. not evaluate them before the function call itself is evaluated), in order
to be neutral. Furthermore, distribution and inverse distribution may only be applied when f
has no side-effects that will vary (e.g. be duplicated or halved) in the new expression, or affect
the nested computation (e.g. change the result of a condition within a conditional). Another
way to think about this issue is to consider the action result type as a lazy domain-specific
language embedded within a pure functional language (where evaluation order is unspecified).
Spector has performed an empirical study of the tradeoffs in lazy vs. eager function abstraction
for program evolution ISpe961.
The number of neutral transformations applicable to any given program grows quickly with
program size.? Furthermore, synthesis of complex programs and abstractions does not seem to
be passible without them. Thus, a key hypothesis of any approach to AGI requiring significant
program synthesis, without assuming the currently infeasible computational capacities required
to brute-force the problem, is that the inductive bias to select promising neutral transforma-
tions can be learned and/or programmed. Referring back to the initial discussion of what con-
stitutes a tractable representation, we speculate that perhaps, whereas well-chosen reductions
are valuable for generically increasing program representation tractability, well-chosen neutral
transformations will be valuable for increasing program representation tractability relative to
distributions P to which the transformations have some (possibly subtle) relationship.
6 Analogous tuple-broadening transformations may be defined as well, but are omitted for brevity.
7 Exact calculations are given by Olsson
EFTA00624208
62
21 Representing Procedural Knowledge
21.7.5 Non-Neutral Transformations
Non-neutral transformations are the general class defined by removal, replacement, and inser-
tion of subexpressions, acting on expressions in normal form, and preserving the normal form
property. Clearly these transformations are sufficient to convert any normal form expression
into any other. What is desired is a subclass of the non-neutral transformations that is combi-
natorially complete, where each individual transformation is nonetheless a semantically small
step.
The full set of transformations for Boolean expressions is given in Ilmo06]. For numerical
expressions, the transcendental functions sin, log, and 9 are used to construct transformations.
These obviate the need for division (a/b = el°5O)-mg(b)), and subtraction (a — b = a + —1 * b).
For lists, transformations are based on insertion of new leaves (e.g. to append function calls),
and "deepening" of the normal form by insertion of subclauses (see II.00061 for details). For
tuples, we take the union of the transformations of all the subtypes. For other mixed-type
expressions the union of the non-neutral transformations for all types mast be considered as
well. For enum types the only transformation is replacing one symbol with another. For function
types, the transformations are based on function composition. For action result types. actions
are inserted/removed/altered, akin to the treatment of Boolean literals for the Boolean type.
We propose an additional class of non-neutral transformations based on the marvelous fold
function:
fokl(f,v,I) = if (empty(I),v, f (first(l), fold(f, v, rest(l))))
With fold we can express a wide variety of iterative constructs, with guaranteed termination
and a bias towards low computational complexity. In fact, fold allows us to represent exactly
the primitive recursive functions pint 94
Even considering only this reduced space of passible transformations, in many cases there
are still too many possible programs "nearby" some target to effectively consider all of them.
For example many probabilistic model-building algorithms, such as learning the structure of a
Bayesian network from data, can require time cubic in the number of variables (in this context
each independent non-neutral transformation can correspond to a variable). Especially as the
size of the programs we wish to learn grows, and as the number of typologically matching
functions increases, there will be simply too many variables to consider each one intensively, let
alone apply a quadratic-time algorithm.
To alleviate this scaling difficulty, we propose three techniques.
The first is to consider each potential variable (i.e. independent non-neutral transformation)
to heuristically determine its usefulness in expressing constructive semantic variation. For exam-
ple, a Boolean transformation that collapses the overall expression into a tautology is assumed
to be useless.8
The second is heuristic coupling rules that allow us to calculate, for a pair of transformations,
the expected utility of applying them in conjunction.
Finally, while fold is powerful, it may need to be augmented by other methods in order to
provide tractable representation of complex programs that would normally be written using nu-
merous variables with diverse scopes. One approach that we have explored involves application
of ISN1191's ideas about director strings as combinators. In Sinot's approach, special program
8 This S heuristic because such a transformation might be useful together with other transformations.
EFTA00624209
21.8 Interfacing Between Procedural and Declarative Knowledge
63
tree nodes are labeled with director strings, and special algebraic operators interrelate these
strings. One then achieves the representational efficiency of local variables with diverse scopes,
without needing to do any actual variable management. Reductions and other (non-)neutral
transformation rules related to broadening and reducing variable scope may then be defined
using the director string algebra.
21.8 Interfacing Between Procedural and Declarative Knowledge
Finally, another critical aspect of procedural knowledge is its interfacing with declarative knowl-
edge. We now discuss the referencing of declarative knowledge within procedures, and the ref-
erencing of the details of procedural knowledge within CogPrime's declarative knowledge store.
21.8.1 Programs Manipulating Atoms
Above we have used Combo syntax implicitly, referring to Appendix ?? for the formal defi-
nitions. Now we introduce one additional, critical element of Combo syntax: the capability to
explicitly reference declarative knowledge within procedures.
For this purpose Combo mast contain the following types:
Atom, Node, Link,TruthV clue, AtomType, AtomTable
Atom is the union of Node and Link.
So a type Node within a Combo program refers to a Node in CogPrime's AtomTable. The
mechanisms used to evaluate these entities during program evaluation are discussed in Chapter
25.
For example, suppose one wishes to write a Combo program that creates Atoms embodying
the predicate-argument relationship eats(cat, fish), represented
Evaluation eats (cat, fish)
aka
Evaluation
eats
List
cat
fish
To do this, one could say for instance,
new-link (EvaluationLink
new-node(PredicateNode "eats")
new-link (ListLink
new-node(ConceptNode "cat")
new-node(ConceptNode "fish"))
(new-sty .99 .99))
EFTA00624210
64
21 Representing Procedural Knowledge
21.9 Declarative Representation of Procedures
Next, we consider the representation of program tree internals using declarative data structures.
This is important if we want OCP to inferentially understand what goes on inside programs.
In itself, it is more of a "bookkeeping" issue than a deep conceptual issue, however.
First, note that each of the entities that can live at an internal node of a program, can also
live in its own Atom. For example. a number in a program tree corresponds to a NmnberNode;
an argument in a Combo program already corresponds to some Atom; and an operator in a
program can be wrapped up in a Schemallode all its own, and considered as a one-leaf program
tree.
Thus, one can build a kind of virtual, distributed program tree by linking a number of
ProcedureNodes (i.e. PredicateNodes or Schemallodes) together. All one needs in order to
achieve this is an analogue of the @ symbol (as defined in Section 20.3 of Chapter 20) for
relating ProcedureNodes. This is provided by the ExecutionLink type, where
(ExecutionLink f g1
essentially means the same as
f g
in curried notation or
e
/ \
f
g
The same generalized evaluation rules used inside program trees may be thought of in terms
of ExecutionLinks; formally, they are crisp ExtensionalimplicationLinks among ExecutionLinks.
Note that we are here using ExecutionLink as a curried function; that is, we are looking at
(ExecutionLink f g) as a function that takes an argument x, where the truth value of
(ExecutionLink f g1 x
represents the probability that executing f, on input g, will give output x.
One may then construct combinator expressions linking multiple ExecutionLinks together;
these are the analogues of program trees.
For example, using ExecutionLinks, one equivalent of y = x + x^2 is:
Hypothetical
SeguentialAND
ExecutionLink
pow
List vl 2
v2
ExecutionLink
List vl v2
v3
Here the vl, v2, v3 are variables which may be internally represented via combinators. This
AND is sequential in case the evaluation order inside the program interpreter makes a difference.
As a practical matter, it seems there is no purpose to explicitly storing program trees in
conjunction-of-ExecutionLinks form. The information in the ExecutionLink conjunct is already
there in the program tree. However, the PLN reasoning system, when reasoning on program
trees, may carry out this kind of expansion internally as part of its analytical process.
EFTA00624211
Section II
The Cognitive Cycle
EFTA00624212
Chapter 22
Emotion, Motivation, Attention and Control
Co-authored with Zhenhua Cai
22.1 Introduction
This chapter begins the heart of the book: the part that explains how the CogPrime design
aims to implement roughly human-like general intelligence, at the human level and ultimately
beyond. First, here in Section II we explain how CogPrime can be used to implement a simplistic
animal-like agent without much learning: an agent that perceives, acts and remembers, and
chooses actions that it thinks will achieve its goals; but doesn't do any sophisticated learning or
reasoning or pattern recognition to help it better perceive, act, remember or figure out how to
achieve its goals. We're not claiming CogPrime is the best way to implement such an animal-like
agent, though we suggest it's not a bad way and depending on the complexity and nature of
the desired behaviors, it could be the best way. We have simply chosen to split off the parts of
CogPrime needed for animal-like behavior and present them first, prior to presenting the various
"knowledge creation" (learning, reasoning and pattern recognition) methods that constitute the
more innovative and interesting part of the design.
In Stan Franklin's terms, what we explain here in Section II is how a basic cognitive cycle
may be achieved within CogPrime. In that sense, the portion of CogPrime explained in this
Sectionis somewhat similar to the parts of Stan's LIDA architecture that have currently been
worked out in detail, and that . However, while LIDA has not yet been extended in detail (in
theory or implementation) to handle advanced learning, cognition and language, those aspects
of CogPrime have been developed and in fact constitute the largest portion of this book.
Looking back to the integrative diagram from Chapter 5, the cognitive cycle is mainly about
integrating vaguely LIDA-like structures and mechanisms with heavily Psi-like structures and
mechanisms - but doing so in a way that naturally links in with perception and action mecha-
nisms "below," and more abstract and advanced learning mechanisms "above."
In terms of the general theory of general intelligence, the basic CogPrime cognitive cycle can
be seen to have a foundational importance in biasing the CogPrime system toward the problem
of controlling an agent in an environment requiring a variety of real-time and near-real-time
responses based on a variety of kinds of knowledge. Due to its basis in human and animal
cognition, the CogPrime cognitive cycle likely incorporates many useful biases in ways that
are not immediately obvious, but that would become apparent if comparing intelligent agents
controlled by such a cycle versus intelligent agents controlled via other means.
The cognitive cycle also provides a framework in which other cognitive processes, relating
to various aspects of the goals and environments relevant to human-level general intelligence,
67
EFTA00624214
68
22 Emotion, Motivation, Attention and Control
may conveniently dynamically interoperate. The "Mind OS" aspect of the CogPrime archi-
tecture provides general mechanisms in which various cognitive processes may interoperate on
a common knowledge store; the cognitive cycle goes further and provides a specific dynamical
pattern in which multiple cognitive processes may intersect. Its effective operation places strong
demands on the cognitive synergy between the various cognitive processes involved, but also
provides a framework that encourages this cognitive synergy to develop and persist.
Finally, it should be stressed that the cognitive cycle is not all-powerful nor wholly pervasive
in CogPrime's dynamics. It's critical for the real-time interaction of a CogPrime-controlled agent
with a virtual or physical world; but there may be many processes within CogPrime that most
naturally operate outside such a cycle. For instance, humans will habitually do deep intellectual
thinking (even something so abstract as mathematical theorem proving) within a cognitive cycle
somewhat similar to the one they use for practical interaction with the external world. But,
there's no reason that CogPrime systems need to be constrained in this way. Deviating from a
cognitive cycle based dynamic may cause a CogPrime system to deviate further from human-
likeness in its intelligence, but may also help it to perform better than humans on some tasks,
e.g. tasks like scientific data analysis or mathematical theorem proving that benefit from styles
of information processing that humans aren't particularly good at.
22.2 A Quick Look at Action Selection
We will begin our exposition of CogPrime's cognitive cycle with a quick look at action selection.
As Stan Franklin likes to point out, the essence of an intelligent agent is that it does things; it
takes actions. The particular mechanisms of action selection in CogPrime are a bit involved and
will be given in Chapter 24; in this chapter we will give the basic idea of the action selection
mechanism and then explain how a variant of the Psi model (described in Chapter 4 of Part 1
above) is used to handle motivation (emotions, drives, goals, etc.) in CogPrime, including the
guidance of action selection.
The crux of CogPrime's action selection mechanism is as follows
• the action selector chooses procedures that seem likely to help achieve important goals in
the current context
- Example: If the goal is to create a block structure that will surprise Bob, and there is
plenty of time, one procedure worth choosing might be a memory search procedure for
remembering situations involving Bob and physical structures. Alternately, if there isn't
much time, one procedure worth choosing might be a procedure for building the base
of a large structure - as this will give something to use as part of whatever structure
is eventually created. Another procedure worth choosing might be one that greedily
assembles structures from blocks without any particular design in mind.
• to support the action selector, the system builds implications of the form Context&Procedure
Goal, where Context is a predicate evaluated based on the agent's situation
— Example: If Bob has asked the agent to do something, and it knows that Bob is very
insistent on being obeyed, then implications such as
• "Bob instructed to do X" and "do X" -, "please Bob" < .9, .9 >
will be utilized
EFTA00624215
22.3 Psi in CogPrime
69
— Example: If the agent wants to make a tower taller, then implications such as
• '
is a blocks structure" and "place block atop
—) "make T taller" < .9, .9 >
will be utilized
• the truth values of these implications are evaluated based on experience and inference
- Example: The above implication involving Bob could be evaluated based on experience,
by assessing it against remembered episodes involving Bob giving instructions
— Example: The same implication could be evaluated based on inference, using analogy
to experiences with instructions from other individuals similar to Bob; or using things
Bob has explicitly said, combined with knowledge that Bob's self-descriptions tend to
be reasonably accurate
• Importance values are propagated between goals using economic attention allocation (and,
inference is used to learn subgoals from existing goals)
- Example: If Bob has told the agent to do X, and the agent has then derived (from the
goal of pleasing Bob) the goal of doing X, then the "please Bob" goal will direct some
of its currency to the "do X" goal (which the latter goal can then pass to its subgoals,
or spend on executing procedures)
These various processes are carried out in a manner orchestrated by Domer's Psi model as
refined by Joscha Bach (as reviewed in Chapter 4 above), which supplies (among other features)
• a specific theory regarding what "demands" should be used to spawn the top-level goals
• a set of (four) interrelated system parameters governing overall system state in a useful
manner reminiscent of human and animal psychology
• a systematic theory of how various emotions (wholly or partially) emerge from more fun-
damental underlying phenomena
22.3 Psi in CogPrime
The basic concepts of the Psi approach to motivation, as reviewed in Chapter 4 of Part 1 above,
are incorporated in CogPrime as follows (note that the following list includes many concepts
that will be elaborated in more detail in later chapters):
• Demands are GroundedPredicateNodes (GPNs), i.e. Nodes that have their truth value com-
puted at each time by some internal C++ code or some Combo procedure in the Procedur-
eRepository
- Examples: Alertness, perceived novelty, internal novelty, reward from teachers, social
stimulus
- Humans and other animals have familiar demands such as hunger, thirst and excretion;
to create an AGI closely emulating a human or (say) a dog one may wish to simulate
these in one's AGI system as well
• Urges are also GPNs, with their truth values defined in terms of the truth values of the Nodes
for corresponding Demands. However in CogPrime we have chosen the term "Ubergoar
instead of Urge, as this is more evocative of the role that these entities play in the system's
dynamics (they are the top-level goals).
EFTA00624216
70
22 Emotion, Motivation, Attention and Control
• Each system comes with a fixed set of Ubergoals (and only very advanced CogPrime systems
will be able to modify their Ubergoals)
— Example: Stay alert and alive now and in the future; experience and learn new things
now and in the future; get reward from the teachers now and in the future; enjoy rich
social interactions with other minds now and in the future
- A more advanced CogPrime system could have abstract (but experientially grounded)
ethical principles among its Ubergoals, e.g. an Ubergoal to promote joy. an Ubergoal
to promote growth and an Ubergoal to promote choice, in accordance with the ethics
described in IGoci
• The ShortTermImportance of an Ubergoal indicates the urgency of the goal, so if the De-
mand corresponding to an Ubergoal is within its target range, then the Ubergoal will have
zero STI. But all Ubergoals can be given maximal LTI to guarantee they don't get deleted.
- Example: If the system is in an environment continually providing an adequate level of
novelty (according to its Ubergoal), then the Ubergoal corresponding to external novelty
will have low STI but high LTI. The system won't expend resources seeking novelty. But
then, if the environment becomes more monotonous, the urgency of the external novelty
goal will increase, and its STI will increase correspondingly, and resources will begin
getting allocated toward improving the novelty of the stimuli received by the agent.
• Pleasure is a GPN, and its internal truth value computing program compares the satisfaction
of Ubergoals to their expected satisfaction
- Of course, there are various mathematical functions (e.g. p'th power averages' for dif-
ferent p) that one can use to average the satisfaction of multiple Ubergoals; and choices
here, i.e. different specific ways of calculating Pleasure, could lead to systems with
different "personalities"
• Goals are Nodes or Links that are on the system's list of goals (the GoalPool). Ubergoals
are automatically Goals, but there will also be many other Goals also
- Example: The Ubergoal of getting reward from teachers might spawn subgoals like
"getting reward from Bob" (if Bob is a teacher), or "making teachers smile" or "create
surprising new structures" (if the latter often garners teacher reward). The subgoal of
"create surprising new structures" might, in the context of a new person entering the
agent's environment with a bag of toys, lead to the creation of a subgoal of asking for
a new toy of the sort that could be used to help create new structures. Etc.
• Psi's memory is CogPrime's AtomTable, with associated structures like the Procedur-
eRepository (explained in Chapter 19), the SpaceServer and TimeServer (explained in
Chapter 26), etc.
— Examples: The knowledge of what blocks look like and the knowledge that tall struc-
tures often fall down, go in the AtomTable; specific procedures for picking up blocks of
different shapes go in the ProcedureRepository; the layout of a room or a pile of blocks
at a specific point in time go in the SpaceServer; the series of events involved in the
building-up of a tower are temporally indexed in the TimeServer.
I the p'th power average is defined as vE XP
EFTA00624217
22.3 Psi in CogPrime
71
- In Psi and MicroPsi, these same phenomena are stored in memory in a rather different
way, yet the basic Psi motivational dynamics are independent of these representational
choices
• Psi's "motive selection" process is carried out in CogPrime by economic attention allocation,
which allocates ShortTermImportance to Goal nodes
- Example: The flow of importance from "Get reward from teachers" to "get reward from
Bob" to "make an interesting structure with blocks" is an instance of what Psi calls
"motive selection". No action is being taken yet, but choices are being made regarding
what specific goals are going to be used to guide action selection.
• Psi's action selection plays the same role as CogPrime's action selection, with the clari-
fication that in CogPrime this is a matter of selecting which procedures (i.e. schema) to
run, rather than which individual actions to execute. However, this notion exists in Psi as
well, which accounts for "automatized behaviors" that are similar to CogPrime schemata
the only (minor) difference here is that in CogPrime automatized behaviors are the default
case.
— Example: If the goal "make an interesting structure with blocks" has a high STI, then it
may be used to motivate choice of a procedure to execute, e.g. a procedure that finds an
interesting picture or object seen before and approximates it with blocks, or a procedure
that randomly constructs something and then filters it based on interestingness. Once a
blocks-structure-building procedure is chosen, this procedure may invoke the execution
of sub-procedures such as those involved with picking up and positioning particular
blocks.
• Psi's planning is carried out via various learning processes in CogPrime, including PLN plus
procedure learning methods like MOSES or hillclimbing
— Example: If the agent has decided to build a blocks structure emulating a pyramid
(which it saw in a picture), and it knows how to manipulate and position individual
blocks, then it must figure out a procedure for carrying out individual-block actions that
will result in production of the pyramid. In this case, a very inexperienced agent might
use MOSES or hillclimbing and "guidedly-randomly" fiddle with different construction
procedures until it hit on something workable. A slightly more experienced agent would
use reasoning based on prior structures it had built, to figure out a rational plan (like:
"start with the base, then iteratively pile on layers, each one slightly smaller than the
previous.")
• The modulators are system parameters which may be represented by PredicateNodes, and
which must be incorporated appropriately in the dynamics of various MindAgents, e.g.
- activation affects action selection. For instance this may be effected by a process that,
each cycle, causes a certain amount of STICurrency to pass to schema satisfying certain
properties (those involving physical action, or terminating rapidly). The amount of
currency passed in this way would be proportional to the activation
- resolution level affects perception schema and MindAgents, causing them to expend less
effort in processing perceptual data
- certainty affects inference and pattern mining and concept creation processes, causing
them to place less emphasis on certainty in guiding their activities, i.e. to be more
EFTA00624218
72
22 Emotion, Motivation, Attention and Control
accepting of uncertain conclusions. To give a single illustrative example: lV1Len backward
chaining inference is being used to find values for variables, a "fitness target" of the
form strength x confidence Ls sometimes used: this may be replaced with strengths' x
cortfidence2-P, where activation parameter affects the exponent p, so when p tends to 0
confidence is more important, when p tends to 2 strength is more important and when
p tends to 1 strength and confidence are equally important.
- selection threshold may be used to effect a process that, each cycle, causes a certain
amount of STICurrency (proportional to the selection threshold) to pass to the Goal
Atoms that were wealthiest at the previous cycle.
Based on this run-down, Psi and CogPrime may seem very similar, but that's because we have
focused here only on the motivation and emotion aspect. Psi uses a very different knowledge
representation than CogPrime; and in the Psi architecture diagram, nearly all of CogPrime is
pushed into the role of "background processes that operate in the memory box." According to the
theoretical framework underlying CogPrime, the multiple synergetic processes operating in the
memory box are actually the crux of general intelligence. But getting the motivation/emotion
framework right Ls also very important, and Psi seems to do an admirable job of that.
22.4 Implementing Emotion Rules atop Psi's Emotional Dynamics
Human motivations are largely determined by human emotions, which are the result of human-
ity's evolutionary heritage and embodiment, which are quite different than the heritage and
embodiment of current AI systems. So, if we want to create AGI systems that lack humanlike
bodies, and didn't evolve to adapt to the same environments as humans did, yet still have
vaguely human-like emotional and motivational structures, the latter will need to be explicitly
engineered or taught in some way.
For instance, if one wants to make a CogPrime agent display anger, something beyond Psi's
model of emotion needs to be coded into the agent to enable this. After all, the rule that when
angry the agent has some propensity to harm other beings, is not implicit in Psi and needs to be
programmed in. However, making use of Psi's emotion model, anger could be characterized as
an emotion consisting of high arousal, low resolution, strong motive dominance, few background
checks, strong goal-orientedness (as the Psi model suggests) and a propensity to cause harm to
agents or objects. This is much simpler than specifying a large set of detailed rules characterizing
angry behavior.
The "anger" example brings up the point that desirability of giving AGI systems closely
humanlike emotional and motivational systems is questionable. After all we humans cause
ourselves a lot of problems with these aspects of our mind/brains, and we sometimes put
our more ethical and intellectual sides at war with our emotional and motivational systems.
Looking into the future, an AGI with greater power than humans yet a humanlike motivational
and emotional system, could be a very dangerous thing.
On the other hand, if an AGI's motivational and emotional system is too different from
human nature, we might have trouble understanding it, and it understanding us. This problem
shouldn't be overblown - it seems possible that an AGI with a more orderly and rational
motivational system than the human one might be able to understand us intellectually very,
well, and that we might be able to understand it well using our analytical tools. However,
if we want to have mutual empathy with an AGI system, then its motivational and emotional
EFTA00624219
22.5 Coals and Contexts
73
framework had better have at least some reasonable overlap with our own. The value of empathy
for ethical behavior was stressed extensively in Chapter 12 of Part 1.
This is an area where experimentation is going to be key. Our initial plan is to supply
CogPrime with rough emulations of some but not all human emotions. We see no need to take
explicit pains to simulate emotions like anger, jealousy and hatred. On the other hand, joy,
curiosity, sadness, wonder, fear and a variety of other human emotions seem both natural in
the context of a robotically or virtually embodied CogPrime system, and valuable in terms of
allowing mutual human/CogPrime empathy.
22.4.1 Grounding the Logical Structure of Emotions in the Psi Model
To make this point in a systematic way, we point out that Ortony et al's FOCC90I "cognitive
theory of emotions" can be grounded in CogPrime's version of Psi in a natural way. This theory
captures a wide variety of human and animal emotions in a systematic logical framework, so that
grounding their framework in CogPrime Psi goes a long way toward explaining how CogPrime
Psi accounts for a broad spectrum of human emotions.
The essential idea of the cognitive theory of emotions can be seen in Figure 22.1. What we
see there is that common emotions can be defined in terms of a series of choices:
• Is it positive or negative?
• Is it a response to an agent, an event or an object?
• Is it focused on consequences for oneself, or for another?
- If on another, is it good or bad for the other?
- If on oneself, is it related to some event whose outcome is uncertain?
• if it's related to an uncertain outcome, did the expectation regarding the outcome
get fulfilled or not?
Figure 22.1 shows how each set of answers to these questions leads to a different emotion. For
instance: what is a negative emotion, responding to events, focusing on another, and undesirable
to the other? Pity.
In the list of questions, we see that two of them - positive vs. negative, and expectation ful-
fillment vs. otherwise - are foundational in the Psi model. The other questions are evaluations
that an intelligent agent would naturally make, but aren't bound up with Psi's emotion/moti-
vation infrastructure in such a deep way. Thus, the cognitive theory of emotion emerges as a
combination of some basic Psi factors with some more abstract cognitive properties (good vs.
bad for another; agents vs. events vs. objects).
22.5 Goals and Contexts
Now we dig deeper into the details of motivation in CogPrime. Just as we have both explicit
(local) and implicit (global) memory in CogPrime, we also have both explicit and implicit
goals. An explicit goal is formulated as a Goal Atom, and then MindAgents specifically orient
the system's activity toward achievement of that goal. An implicit goal is something that the
EFTA00624220
74
22 Emotion, Motivation, Attention and Control
witt FTC 0 REACTION TO
r
Cant aA
a
a
Tarot%
Wawa
••••••••0
ASPIC'S
a
MUM
dukci
OK
SODAS) al
I
row/moot
Luta
AOCMI
I
COGEOUEMat
KO OMER
OEISTAI UPOISRAIll
FOR OTHER la OMER
I
I
PROSPECTS
Kama
I
COMKOLIDICES
f
ULF
PPOSPECTS
IRREIRVIPTI
POSY
Mel
••••••••N
*Ps
SRN
iatelin
—
—
NMY
POMMISCIP
ATTIPINTRIN
ATTRACT—
•41.1.41OPICI
Ton
Imp
COMMED
DICOPPORIG0
pans.
paw.
Wean
fiesepdbal
8•10
IffivelTemod
SYR
VARIAINO*11110,MOR
COSOMPS
Fig. 22.1: Ontology of Emotions from IOCC90]
system works toward, but in a more loosely organized way, and without necessarily explicitly
representing the knowledge that it is working toward that goal.
Here we will focus mainly on explicit motivation, beginning with a description of Goal Atoms,
and the Contexts in which Goals are worked toward via executing Procedures. Figure 22.2 gives
a rough depiction of the relationship between goals, procedures and context, in a simple example
relevant to an OpenCogPrime-controlled virtual agent in a game world.
22.5.1 Goal Atoms
A Goal Atom represents a target system state and is true to the extent that the system satisfied
the conditions it represents. A Context Atom represents an observed state of the world/mind,
and is true to the extent that the state it defines is observed. Taken together, these two Atom
types provide the infrastructure CogPrime needs to orient its actions in specific contexts toward
specific goals. Not all of CogPrime's activity is guided by these Atoms; much of it is non-goal-
directed and spontaneous, or ambient as we sometimes call it. But it is important that some
EFTA00624221
22.5 Coals and Contexts
75
/
'X
"
Yell
Bob
Bob's
fewer
71
Glee Bob I
a hug
Ask Bob
'What are
you doing?'
CONTEXT
Bob is
nearby
building
a tower
AND
Fig. 22.2: Context, Procedures and Goals. Examples of the basic "goal/context/procedure"
triad in a simple game-agent situation.
of the system's activity - and in some cases, a substantial portion - is controlled explicitly via
goals.
Specifically, a Goal Atom is simply an Atom (usually a PredicateNocle, sometimes a Link, and
potentially another type of Atom) that has been selected by the GoalRefinement MindAgent as
one that represents a state of the atom space which the system finds important to achieve. The
extent to which an Atom is considered a Goal Atom at a particular point in time is determined
by how much of a certain kind of financial instrument called an RFS (Request For Service) it
possess
(as will be explained in Chapter 24).
A CogPrime instance must begin with some initial Ubergools (aka top level supergoals), but
may then refine these goals in various ways using inference. Immature, "childlike" CogPrime
systems cannot modify their Ubergoals nor add nor delete Ubergoals. Advanced CogPrime
systems may be allowed to modify, add or delete Ubergoals, but this is a critical and subtle
aspect of system dynamics that must be treated with great care. WIKISOURCE:ContextAtom
EFTA00624222
76
22 Emotion. Motivation, Attention and Control
22.6 Context Atoms
Next, a Context is simply an Atom that is used as the source of a ContextLink, for instance
Context
quantum_computing
Inheritance
Ben amateur
or
Context
game_of_fetch
PredictiveAttraction
Evaluation give (ball,
teacher)
Satisfaction
The former simply says that Ben is an amateur in the context of quantum computing. The latter
says that in the context of the game of fetch, giving the ball to the teacher implies satisfaction.
A more complex instance pertinent to our running example would be
Context
Evaluation
Recently
List
Minute
Evaluation
Ask
List
Bob
ThereExists
SX
And
Evaluation
Build
List
self
SX
Evaluation
surprise
List
SX
Bob
AverageQuantifier SY
PredictiveAttraction
And
Evaluation
Build
List
self
SY
Evaluation
surprise
List
SY
Jim
Satisfaction
EFTA00624223
22.7 Ubergoal Dynamics
77
which says that, if the context is that Bob has recently asked for something surprising to be
built, then one strategy for getting satisfaction is to build something that seems likely to satisfy
Jim.
An implementation-level note: in the current OpenCogPrime implementation of CogPrime,
ContextLinks are implicit rather than explicit entities. An Atom can contain a ComplexTruth-
Value which in turn contains a number of VersionHandles. Each VersionHandle associates a
Context or a Hypothetical with a TruthValue. This accomplishes the same thing as a formal
ContextLink, but without the creation of a ContextLink object. However, we continue to use
ContextLinks in this book and other documents about CogPrime; and it's quite possible that
future CogPrime implementations might handle them differently.
22.7 Ubergoal Dynamics
In the early phases of a CogPrime system's cognitive development, the goal system dynamics
will be quite simple. The Ubergoals are supplied by human programmers, and the system's
adaptive cognition is used to derive subgoals. Attentional currency allocated to the Ubergoals
is then passed along to the subgoals. as judged appropriate.
As the system becomes more advanced, however, more interesting phenomena may arise
regarding Ubergoals: implicit and explicit Ubergoal creation.
22.7.1 Implicit Ubergoal Pool Modification
First of all, implicit Ubergoal creation or destruction may occur. Implicit Ubergoal destruction
may occur when there are multiple Ubergoals in the system, and some prove easier to achieve
than others. The system may then decide not to bother achieving the more difficult Ubergoals.
Appropriate parameter settings may mitigate against this phenomenon, of course.
Implicit Ubergoal creation may occur if some Goal Node G arises that inherits as a subgoal
from multiple Ubergoals. This Goal G may then come to act implicitly as an Ubergoal, in that
it may get more attentional currency than any of the Ubergoals.
Also, implicit Ubergoal creation may occur via forgetting. Suppose that G becomes a goal via
inferred inheritance from one or more Ubergoals. Then, suppose G forgets why this inheritance
exists, and that in fact the reason becomes obsolete, but the system doesn't realize that and
keeps the inheritance there. Then, G is an implicit Ubergoal in a strong sense: it gobbles up
a lot of attentional currency, potentially more than any of the actual Ubergoals, but actually
doesn't help achieve the Ubergoals, even though the system thinks it does. This kind of dynamic
is obviously very bad and should be avoided - and can be avoided with appropriate tuning of
system parameters (so that the system pays a lot of attention to making sure that its subgoaling-
related inferences are correct and are updated in a timely way).
EFTA00624224
78
22 Emotion, Motivation, Attention and Control
22.7.2 Explicit Ubergoal Pool Modification
An advanced CogPrime system may be given the ability to explicitly modify its Ubergoal pool.
This is a very interesting but very subtle type of dynamic, which is not currently well understood
and which potentially could lead to dramatically unpredictable behaviors.
However, modification, creation and deletion of goals is a key aspect of human psycholou,
and the granting of this capability to mature CogPrime systems must be seriously considered.
In the case that Ubergoal pool modification is allowed, one useful heuristic may be to make
implicit Ubergoals into explicit Ubergoals. For instance: if an Atom is found to consistently
receive a lot of RFSs, and has a long time-scale associated with it, then the system should
consider making it an Ubergoal. But this heuristic is certainly not sufficient, and any advanced
CogPrime system that is going to modify its own Ubergoals should definitely be tuned to put
a lot of thought into the process!
The science of Ubergoal pool dynamics basically does not exist at the moment, and one
would like to have some nice mathematical models of the process prior to experimenting with it
in any intelligent capable CogPrime system. Although Schmiddhuber's Godel machine ISch061
has the theoretical capability to modify its ubergoal (note that CogPrime is, in some way, a
Gadd machine), there is currently no mathematics allowing us to assess the time and space
complexity of such process in a realistic context, given a certain safety confidence target.
22.8 Goal Formation
Goal formation in CogPrime is done via PLN inference. In general, what PLN does for goal
formation is to look for predicates that can be proved to probabilistically imply the existing
goals. These new predicates will then tend to receive RFS currency, according to the logic
of RFS's to be outlined in Chapter 24, which (according to goal-driven attention allocation
dynamics) will make the system more likely to enact procedures that lead to their satisfaction.
As an example of the goal formation process, consider the case where ExternalNovelty Ls an
Ubergoal. The agent may then learn that whenever Bob gives it a picture to look at, its quest
for external novelty Ls satisfied to a singificant degree. That is, it learns
Attraction
Evaluation give (Bob, me, picture)
ExternalNovelty
where Attraction A B measures how much A versus
implies B (as explained in Chapter 34).
This information allows the agent (the Goal Formation MindAgent) to nominate the atom:
EvaluationLink give (Bob, me, picture)
as a goal (a subgoal of the original Ubergoal). This is an example of goal refinement, which is
one among many ways that PLN can create new goals from existing ones.
EFTA00624225
22.10 Context Formation
79
22.9 Goal Fulfillment and Predicate Schematization
When there is a Goal Atom G important in the system (with a lot of RFS), the GoalFulfillment
MindAgent seeks Schemallodes $ that it has reason to believe, if enacted, will cause G to
become true (satisfied). It then adds these to the ActiveSchemaPool, an object to be discussed
below. The dynamics by which the GoalFulfillment process works will be discussed in Chapter
24 below.
For example, if a Context Node Chas a high truth value at that time (because it is currently
satisfied), and is involved in a relation:
Attraction
C
PredictiveAttraction S G
(for some Schemallode S and Goal Node G) then this Schemallode S is likely to be se-
lected by the GoalFulfillment process for execution. This is the fully formalized version of the
Context&Schema —> Goal notion discussed frequently above. The process may also allow the
importance of various schema S to bias its choices of which schemata to execute.
For instance, following up previous examples, we might have
Attraction
Evaluation
near
List
self
Bob
PredictiveAttraction
Evaluation
ask
List
Bob
"Show me a picture"
ExternalNovelty
Of course this is a very simplistic relationship but it's similar to a behavior a young child might
display. A more advanced agent would utilize a more abstract relationship that distinguishes
various situations in which Bob is nearby, and also involves expressing a concept rather than a
particular sentence.
The formation of these schema-context-goal triads may occur according to generic inference
mechanisms. However, a specially-focused PredicateSchematization MindAgent is very useful
here as a mechanism of inference control, increasing the number of such relations that will exist
in the system.
22.10 Context Formation
New contexts are formed by a combination of processes:
• The MapEncapstdation MindAgent, which creates Context Nodes embodying repeated pat-
terns in the perceived world. This process encompasses
- Maps creating Context Nodes involving Atoms that have high STI at the same time
EFTA00624226
80
22 Emotion. Motivation, Attention and Control
• Example: A large number of Atoms related to towers could be joined into a single
map, which would then be a ConceptNode pointing to "tower-related ideas, proce-
dures and experiences"
- Maps creating Context Nodes that are involved in a temporal activation pattern that
recurs at multiple points in the system's experience.
• Example: There may be a common set of processes involving creating a building
out of blocks: first build the base, then the walls, then the roof. This could be
encapsulated as a temporal map embodying the overall nature of the process. In
this case, the map contains information of the nature: first do things related to this,
then do things related to this, then do things related to this...
• A set of concept creation MindAgents (see Chapter 38, which fuse and split Context Nodes
to create new ones.
— The concept of a building and the concept of a person can be merged to create the
concept of a BuildingMan
- The concept of a truck built with Legos can be subdivided into trucks you can actually
carry Lego blocks with, versus trucks that are "just for show" and can't really be loaded
with objects and then carry them around
22.11 Execution Management
The GoalFulfillment MindAgent chooses schemata that are found likely to achieve current goals,
but it doesn't actually execute these schemata. What it does is to take these schemata and place
them in a container called the ActiveSchemaPool.
The ActiveSchemaPool contains a set of schemata that have been determined to be reason-
ably likely, if enacted, to significantly help with achieving the current goal-set. I.e., everything
in the active schema pool should be a schema S so that it has been concluded that
Attraction
C
PredictiveAttraction
S G
- where C is a currently applicable context and C is one of the goals in the current goal pool
- has a high truth value compared to what could be obtained from other known schemata S or
other schemata S that could be reasonably expected to be found via reasoning.
The decision of which schemata in the ActiveSchemaPool to enact is made by an object
called the ExecutionManager, which is invoked each time the SchemaActivation MindAgent is
executed. The ExecutionManager is used to select which schemata to execute, based on doing
reasoning and consulting memory regarding which active schemata can usefully be executed
simultaneously without causing destructive interference (and hopefully causing constructive
interference). This process will also sometimes (indirectly) cause new schemata to be created
and/or other schemata from the AtomTable to be made active. This process is described more
fully in Chapter 24 on action selection. WIKISOURCE:GoalsAndTime
For instance, if the agent is involved in building a blocks structure intended to surprise or
please Bob, then it might simultaneously carry out sonic blocks-manipulation schema, and also
a schema involving looking at Bob to garner his approval. If it can do the blocks manipulation
without constantly looking at the blocks, this should be unproblematic for the agent.
EFTA00624227
22.12 Coals and Time
81
22.12 Goals and Time
The CogPrime system maintains an explicit list of "Ubergoals", which as will be explained
in Chapter 24, receive attentional currency which they may then allocate to their subgoals
according to a particular mechanism.
However, there is one subtle factor involved in the definition of the Ubergoals: time. The
truth value of a Ubergoal is typically defined as the average level of satisfaction of some De-
mand over some period of time - but the time scale of this averaging can be very important.
In many cases, it may be worthwhile to have separate Ubergoals corresponding to the same
Demand but doing their truth-value time-averaging over different time scales. For instance,
corresponding to Demands such as Novelty or Health, we may posit both long-term and short-
term versions, leading to Ubergoals such as CurrentNovelty, LongTermNovelty, CurrentHealth,
LongTermHealth, etc. Of course, one could also wrap multiple Ubergoals corresponding to a
single Demand into a single Ubergoal combining estimates over multiple time scales; this is not
a critical issue and the only point of splitting Demands into multiple UbergoaLs is that it can
make things slightly simpler for other cognitive processes.
For instance, if the agent has a goal of pleasing Bob, and it knows Bob likes to be presented
with surprising structures and ideas, then the agent has some tricky choices to make. Among
other choices it must balance between focusing on
• creating things and then showing them to Bob
• studying basic knowledge and improving its skills.
Perhaps studying basic knowledge and skills will give it a foundation to surprise Bob much
more dramatically in the mid-term future ... but in the short run will not allow it to surprise
Bob much at all, because Bob already knows all the basic material. This is essentially a variant
of the general "exploration versus exploitation" dichotomy, which lacks any easy solution. Young
children are typically poor at carrying out this kind of balancing act, and tend to focus overly
much on near-term satisfaction. There are also significant cultural differences in the heuristics
with which adult humans face these issues; e.g. in some contexts Oriental cultures tend to focus
more on mid to long term satisfaction whereas Western cultures are more short term oriented.
EFTA00624228
Chapter 23
Attention Allocation
Co-authored with Joel Pitt and Matt Ikle' and Rui Liu
23.1 Introduction
The critical factor shaping real-world general intelligence is resource constraint. Without this
issue, we could just have simplistic program-space-search algorithms like AIXI't instead of com-
plicated systems like the human brain or CogPrime. Resource constraint is managed implicitly
within various components of CogPrime, for instance in the population size used in evolu-
tionary learning algorithms, and the depth of forward or backward chaining inference trees
in PLN. But there is also a component of CogPrime that manages resources on a global and
cognitive-process-independent manner: the dictation allocation component.
The general principles the attention allocation process should follow are easy enough to see:
History should be used as a guide, and an intelligence should make probabilistic judgments
based on its experience, guessing which resource-allocation decisions are likely to maximize
its goal-achievement. The problem is that this is a difficult learning and inference problem,
and to carry it out with excellent accuracy would require a limited-resources intelligent system
to spend nearly all its resources deciding what to pay attention to and nearly none of them
actually paying attention to anything else. Clearly this would be a very poor allocation of an AI
system's attention! So simple heuristics are called for, to be supplemented by more advanced
and expensive procedures on those occasions where time is available and correct decisions are
particularly crucial.
Attention allocation plays, to a large extent, a "meta" role in enabling mind-world corre-
spondence. Without effective attention allocation, the other cognitive processes can't do their
jobs of helping an intelligent agent to achieve its goals in an environment, because they won't
be able to pay attention to the most important parts of the environment, and won't get com-
putational resources at the times when they need it. Of course this need could be addressed in
multiple different ways. For example, in a system with multiple complex cognitive processes,
one could have attention allocation handled separately within each cognitive process, and then
a simple "top layer" of attention allocation managing the resources allocated to each cognitive
process. On the other hand, one could also do attention allocation via a single dynamic, perva-
sive both within and between individual cognitive processes. The CogPrime design gravitates
more toward the latter approach. though also with some specific mechanisms within various
MindAgents; and efforts have been made to have these specific mechanisms modulated by the
generic attention allocation structures and dynamics wherever possible.
83
EFTA00624230
84
23 Attention Allocation
In this chapter we will dig into the specifics of how these attention allocation issues are
addressed in the CogPrime design. In short, they are addressed via a set of mechanisms and
equations for dynamically adjusting importance values attached to Atoms and MindAgents. Dif-
ferent importance values exist pertinent to different time scales, most critically the short-term
(STI) and long-term (LTI) importances. The use of two separate time-scales here reflects fun-
damental aspects of human-like general intelligence and real-world computational constraints.
The dynamics of STI is oriented partly toward the need for real-time responsiveness, and
the more thoroughgoing need for cognitive processes at speeds vaguely resembling the speed of
"real time" social interaction. The dynamics of LTI is based on the fact that some data tends to
be useful over long periods of time, years or decades in the case of human life, but the practical
capability to store large amounts of data in a rapidly accessible way is limited. One could
imagine environments in which very-long-term multiple-type memory was less critical than it is
in typical human-friendly environments; and one could envision AGI systems carrying out tasks
in which real-time responsiveness was unnecessary (though even then some attention focusing
would certainly be necessary). For AGI systems like these, an attention allocation system based
on STI and LTI with CogPrime-like equations would likely be inappropriate. But for an AGI
system intended to control a vaguely human-like agent in an environment vaguely resembling
everyday human environments, the focus on STI and LTI values, and the dynamics proposed
for these values in CogPrime, appear to make sense.
Two basic innovations are involved in the mechanisms attached to these STI and LTI impor-
tance values:
• treating attention allocation as a data mining problem: the system records information
about what it's done in the past and what goals it's achieved in the past, and then recog-
nizes patterns in this history and uses them to guide its future actions via probabilistically
adjusting the (often context-specific) importance values associated with internal terms, ac-
tors and relationships, and adjusting the "effort estimates" associated with Tasks
• using an artificial-economics approach to update the importance values (attached to Atoms,
MindAgents, and other actors in the CogPrime system) that regulate system attention.
(And, more speculatively, using an information geometry based approach to execute the
optimization involved in the artificial economics approach efficiently and accurately.)
The integration of these two aspects is crucial. The former aspect provides fundamental data
about what's of value to the system, and the latter aspect allows this fundamental data to
be leveraged to make sophisticated and integrative judgments rapidly. The need for the latter,
rapid-updating aspect exists partly because of the need for real-time responsiveness, imposed by
the need to control a body in a rapidly dynamic world, and the prominence in the architecture
of an animal-like cognitive cycle. The need for the former, data-mining aspect (or something
functionally equivalent) exists because, in the context of the tasks involved in human-level
general intelligence, the assignment of credit problem is hard - the relations between various
entities in the mind and the mind's goals are complex, and identifying and deploying these
relationships is a difficult learning problem requiring application of sophisticated intelligence.
Both of these aspects of attention allocation dynamics may be used in computationally
lightweight or computationally sophisticated manners:
• For routine use in real-time activity
EFTA00624231
23.2 Semantics of Short and Long Term Importance
85
- "data mining" consists of forming HebbianLinks (involved in the associative memory
and inference control, see Section 23.5), where the weight of the link from Atom A to
Atom B is based on the probability of shared utility of A and B
- economic attention allocation consists of spreading ShortTermlmportance and LongTer-
mImportance "artificial currency" values (both grounded in the universal underlying
"juju" currency value defined further below) between Atoms according to specific equa-
tions that somewhat resemble neural net activation equations but respect the conser-
vation of currency
• For use in cases where large amounts of computational resources are at stake based on lo-
calized decisions, hence allocation of substantial resources to specific instances of attention-
allocation is warranted
- "data mining" may be more sophisticated, including use of PLN, MOSES and pattern
mining to recognize patterns regarding what probably deserves more attention in what
contexts
- economic attention allocation may involve more sophisticated economic calculations
involving the expected future values of various "expenditures" of resources
The particular sort of "data mining" going on here is definitely not exactly what the human
brain does, but we believe this is a case where slavish adherence to neuroscience would be badly
suboptimal (even if the relevant neuroscience were well known, which is not the case). Doing
attention allocation entirely in a distributed, formal-neural-net-like way is, we believe, extremely
and unnecessarily inefficient, and given realistic resource constraints it leads to the rather poor
attention allocation that we experience every, day in our ordinary waking state of consciousness.
Several aspects of attention allocation can be fruitfully done in a distributed, neural-net-like
way, but not having a logically centralized repository of system-history information (regardless
of whether it's physically distributed or not) seems intrinsically problematic in terms of effective
attention allocation. And we argue that, even for those aspects of attention allocation that are
best addressed in terms of distributed, vaguely neural-net-like dynamics, an artificial-economics
approach has significant advantages over a more strictly neural-net-like approach, due to the
greater ease of integration with other cognitive mechanisms such as forgetting and data mining.
23.2 Semantics of Short and Long Term Importance
We now specify the two types of importance value (short and long term) that play a key role
in CogPrime dynamics. Conceptually, ShortTermImportance (STI) is defined as
STI(A) = P(A will be useful in the near future)
whereas LongTermlmportance (LTI) is defined as
LTI(A) = P(A will be useful eventually, in the foreseeable future)
Given a time-scale T, in general we can define an importance value relative to T as
Ir(A) = P(A will be useful during the next T seconds)
EFTA00624232
86
23 Attention Allocation
In the ECAN module in CogPrime, we deal only with STI and LTI rather than any other
importance values, and the dynamics of STI and LTI are dealt with by treating them as two
separate "artificial currency" values, which however are interconvertible via being mutually
grounded in a common currency called "juju."
For instance, if the agent is intensively concerned with trying to build interesting blocks
structures, then knowledge about interpreting biology research paper abstracts is likely to be
of very little current importance. So its biological knowledge will get low STI, but - assuming
the agent expects to use biology again - it should maintain reasonably high LTI so it can
remain in memory for future use. And if in its brainstorming about what blocks structures
to build, the system decides to use some biological diagrams as inspiration, STI can always
spread to some of the biology-related Atoms, increasing their relevance and getting them more
attention. While the attention allocation system contains mechanisms to convert STI to LTI,
it also has parameter settings biasing it to spend its juju on both kinds of importance - i.e.
it contains an innate bias to both focus its attention judiciously, and manage its long-term
memory conscientiously.
Because in CogPrime most computations involving STI and LTI are required to be very rapid
(as they're done for many Atoms in the memory very frequently), in most cases when dealing
with these quantities, it will be appropriate to sacrifice accuracy for efficiency. On the other
hand, it's useful to occasionally be able to carry out expensive, highly accurate computations
involving importance.
An example where doing expensive computations about attention allocation might pay off,
would be the decision whether to use biology-related or engineering-related metaphors in cre-
ating blocks structures to please a certain person. In this case it could be worth doing a few
steps of inference to figure out whether there's a greater intensional similarity between that
person's interests and biology or engineering; and then using the results to adjust the STI levels
of whichever of the two comes out most similar. This would not be a particularly expensive
inference to carry out, but it's still much more effort than what can be expended on Atoms in
the memory most of the time. Most attention allocation in CogPrime involves simple neural-net
type spreading dynamics rather than explicit reasoning.
Figure 23.1 illustrates the key role of LTI in the forgetting process. Figure 23.2 illustrates
the key role of STI in maintaining a "moving bubble of attention", which we call the system's
AttentionalFocus.
23.2.1 The Precise Semantics of STI and LTI
Now we precisiate the above definitions of STI and LTI.
First, we introduce the notion of reward. Reward is something that Goals give to Atoms. In
principle a Goal might give an Atom reward in various different forms, though in the design
given here, reward will be given in units of a currency called juju. The process by which Goals
assign reward to Atoms is part of the "assignment of credit" process (and we will later discuss
the various time-scales on which assignment of credit may occur and their relationship to the
time-scale parameter within LTI).
Next, we define
J(A,tht2,r) = expected amount of reward A will receive between tt and t2 time-steps in the
future, if its STI has percentile rank r among all Atoms in the AtomTable
EFTA00624233
23.2 Semantics of Short and Long Term Importance
87
0
ATOM
STAY IN
ATOMSPACE
REMOVE FROM
ATOMSPACE
SAVE TO
BACKUP
STORE
PERMANENTLY
DELETE
Fig. 23.1: LongTermlmportance and Forgetting.
The percentile rank r of an Atom is the rank of that Atom in a list of Atoms ordered by
decreasing STI, divided by the total number of Atoms. The reason for using a percentile rank
instead of the STI itself is because at any given time only a limited number of atoms can be
given attention, so all atoms below a certain perceptible rank, depending on the amount of
available resource, will simply be ignored.
This is a fine-grained measure of how worthwhile it is expected to be to increase A's STI, in
terms of getting A rewarded by Goals.
For practical purposes it is useful to collapse J(A, t1, t2, r) to a single number:
J (A, tl, t2) Er J(A,tht2,r)wr
Er tor
where tor weights the different percentile ranks (and should be chosen to be monotone increasing
in r). This is a single-number measure of the responsiveness of an Atom's utility to its STI level.
So for instance if A has a lot of STI and it turns out to be rewarded then J(A, tl, t2) will be
high. On the other hand if A has little STI then whether it gets rewarded or not will not
influence J(A, t1, t2) much.
To simplify notation, it's also useful to define a single-time-point version
J(A, = J(A, t,
23.2.1.1 Formalizing STI
Using these definitions, one simple way to make the STI definition precise is:
Mih nsh(A, t) = P(J(A, t, t + t shord 9 Sthreshoid)
where Sth„shou demarcates the "attentional focus boundary." Which is a way of saying that we
don't want to give STI to atoms that would not get rewarded if they were given attention.
EFTA00624234
ss
23 Attention Allocation
ATTENTIONAL FOCUS
AT TILNTIONAL I OCUS BOUNDARY
Fig. 23.2: Formation of the AttentionalFocus. The dynamics of STI is configured to en-
courage the emergence of richly crass-connected networks of Atoms with high STI (above a
threshold called the AttentionalFocusBoundary), passing STI among each other as long as this
is useful and forming new HebbianLinks among each other. The collection of these Atoms is
called the AttentionalFocus.
Or one could make the STI definition precise in a fuzzier way, and define
CO
smunv(A,0
t + s)c—
s=o
for some appropriate parameter c (or something similar with a decay function less severe than
exponential)
EFTA00624235
23.2 Semantics of Short and Long Term Importance
89
In either case, the goal of the ECAN subsystem, regarding STI, is to assign each Atom A
an STI value that corresponds as closely as possible to the theoretical STI values defined by
whichever one of the above equations is selected (or some other similar equation).
23.2.2 STI, STIFund, and Juju
But how can one estimate these probabilities in practice? In some cases they may be estimated
via explicit inference. But often they must be estimated by heuristics.
The estimative approach taken in current CogPrime design is an artificial economy, in which
each Atom maintains a certain fund of artificial currency. In the current proposal this currency
is called juju and is the same currency used to value LTI. Let us call the amount of juju owned
by Atom A the STIFund of A. Then, one way to formalize the goal of the artificial economy is
to state that: if one ranks all Atoms by the wealth of their STIFtmd, and separately ranks all
Atoms by their theoretical STI value, the rankings should be as close as passible to the same.
One may also formalize the goal in terms of value correlation instead of rank correlation, of
course.
Proving conditions under which the STIFtmd values will actually correlate well with the
theoretical STI values, is an open math problem. Heuristically, one may map STIFund values
into theoretical STI values by a mapping such as
A.STIT'und— STIFund. min
A.STI = a + i3 STIFund. 1111OC -STIF'und, min
where STIFund. min = min X .STIF'und, However, we don't currently have rigorous grounding
for any particular functional form for such a mapping; the above is just a heuristic approxima-
tion.
The artificial economy approach leads to a variety of supporting heuristics. For instance, one
such heuristic is: if A has been used at time t, then it will probably be useful at time t + s for
small .s. Based on this heuristic, whenever a MindAgent uses an Atom A, it may wish to increase
A's STIFund (so as to hopefully increase correlation of A's STIFund with its theoretical STI).
It does so by transferring some of its juju to A's STIFund.
23.2.3 Formalizing LTI
Similarly to STI, with LTI we will define theoretical LTI values, and posit an LTIFund associated
with each Atom, which seeks to create values correlated with the theoretical LTI
For LTI, the theoretical issues are subtler. There is a variety of different ways to precisiate
the above loose conceptual definition of LTI. For instance, one can (and we will below) create
formalizations of both:
1. L77.1(1) = (some time-weighting or normalization of) the expected value of A's total
usefulness over the long-term future
2. LTh,„„i(A) = the probability that A ever becomes highly useful at some point in the long-
term future
EFTA00624236
90
23 Attention Allocation
(here "cont" stands for "continuous"). Each of these may be formalized, in similar but noniden-
tical ways.
These two forms of LTI may be viewed as extremes along a continuum; one could posit a
host of intermediary LTI values between them. For instance, one could define
LT/p(A) = the p'th power average' of expectation of the utility of A over brief time intervals,
measured over the long-term future
Then we would have
LTIbargi =
LTIami = LTI1
and could vary p to vary the sharpness of the LTI computation. This might be useful in some
contexts, but our guess is that it's overkill in practice and that looking at LT/bunt and LTIona
is enough (or more than enough; the current OCP code uses only one LTI value and that has
not been problematic so far).
23.2.4 Applications of LTIfrurat versus LTIcont
It seems that the two forms of LTI discussed above might be of interest in different contexts,
depending on the different ways that Atoms may be used so as to achieve reward.
If an Atom is expected to get rewarded for the results of its being selected by MindAgents
that carry out diffuse, background thinking (and hence often select low-STI Atoms from the
AtomTable), then it may be best associated with LTIcont.
On the other hand, if an Atom is expected to get rewarded for the results of its being
selected by MindAgents that are focused on intensive foreground thinking (and hence generally
only select Atoms with very high STI), it may be best associated with LT/s„„i.
In principle, Atoms could he associated with particular LTIp based on the particulars of
the selection mechanisms of the MindAgents expected to lead to their reward. But the issue
with this is, it would result in Atoms carrying around an excessive abundance of different LTIp
values for various p, resulting in memory bloat; and it would also require complicated analyses
of MindAgent dynamics. If we do need more than one LTI value, one would hope that two will
be enough, for memory conservation reasons.
And of course, if an Atom has only one LTI value associated with it, this can reasonably
be taken to stand in for the other one: either of LT/bunt or LTIcont may, in the absence of
information to the contrary, be taken as an estimate of the other.
23.2.4.1 LTI with Various Time Lags
The issue of the p value in the average in the definition of LTI is somewhat similar to (though
orthogonal to) the point that there are many different interpretations of LTI, achieved via
I the p'th power average is defined as tJ
XP
EFTA00624237
23.3 Defining Burst LTI in Terms of STI
91
considering various time-lags. Our guess is that a small set of time-lags will be sufficient. Perhaps
one wants an exponentially increasing series of time-lags: i.e. to calculate LTI over k cycles where
k is drawn from {r, 2r, 4r, 8r,...Tvr).
The time-lag in LTI seems related to the time-lag in the system's goals. If a Goal object is
disseminating juju, and the Goal has an intrinsic time scale of t, then it may be interested in
LTI on time-scale t. So when a MA (MindAgent) is acting in pursuit of that goal, it should
spend a bunch of its juju on LTI on time-scale t.
Complex goals may be interested in multiple time-scales (for instance, a goal might place
greater value on things that occur in the next hour, but still have nonzero interest in things that
occur in a week), and hence may have different levels of interest in LTI on multiple time-scales.
23.2.4.2 Formalizing Burst LTI
Regarding burst LTI, two approaches to formalization seem to be the threshold version
LT/6„nt.ihmh(A) = P(A will receive a total of at least Sthreshou amount of normalized
stimulus during some time interval of length t„hon in the next tk,„5, time steps)
and the fuzzy version,
CO
Lum,„,.,„zzvo, E
+ si t + S
tshort)f(S, tlimg)
a=0
where f(t,tio,,g) : R+ x R+
R+ is a nonincreasing function that remains roughly constant in
t up till a point te„,„9 steps in the future, and then begins slowly decaying.
23.2.4.3 Formalizing Continuous LTI
The threshold version of continuous LTI is quite simply:
LT/c,,„144,„„h(A, t, ,,9) = 377thrah(A, te,mg)
That is, smooth threshold LTI is just like smooth threshold STI, but the time-scale involved is
longer.
On the other hand, the fuzzy version of smooth LTI is:
03 L J(A.,t -F s)f(s,tic,„9)
s.o
using the same decay function f that was introduced above in the context of burst LTI.
23.3 Defining Burst LTI in Terms of STI
It is straightforward to define burst LTI in terms of STI, rather than directly in terms of juju.
We have
EFTA00624238
92
23 Attention Allocation
LTIoung,th,,,h(A, t) = P(i s:i t; " STImnsh (A, t
s))
Or, using the fuzzy definitions, we obtain instead the approximate equation
CO
LT16,,„Litm,y(A,
Ea(s)STIpazy(A,t s)f(s,tiong)
8=o
where
1 — c
cc(s)
1 — ce+1
or the more complex exact equation:
LT/6„,„E.,,,..,(A, t) . E snii.y(A, t + s)
(
f (s, turn") — E(c' f (s — r, tt87,9)))
OD
r= 1
23.4 Valuing LTI and STI in terms of a Single Currency
We now further discuss the approach of defining LTIFtmd and STIFund in terms of a single
currency: juju (which as noted, corresponds in the current ECAN design to normalized stimu-
lus).
In essence, we can think of STIFund and LTIFund as different forms of financial instrument,
which are both grounded in juju. Each Atom has two financial instruments attached to it:
"STIRind of Atom A" and "LTIFund of Atom A" (or more if multiple versions of LTI are used).
These financial instruments have the peculiarity that, although many agents can put juju into
any one of them, no record is kept of who put juju in which one. Rather, the MA's are acting
so as to satisfy the system's Goals, and are adjusting the STIFund and LTIFund values in a
heuristic manner that is expected to approximately maximize the total utility propagated from
Goals to Atoms.
Finally, each of these financial instruments has a value that gets updated by a specific update
equation.
To understand the logic of this situation better, consider the point of view of a Goal with a
certain amount of resources (juju, to be used as reward), and a certain time-scale on which its
satisfaction is to be measured. Suppose that the goal has a certain amount of juju to expend
on getting itself satisfied.
This Goal clearly should allocate some of its juju toward getting processor time allocated
toward the right Atoms to serve its ends in the near future; and some of its juju toward ensuring
that, in future, the memory, will contain the Atoms it will want to see processor time allocated
to. Thus, it should allocate some of its juju toward boosting the STIFund of Atoms that it
thinks will (if chosen by appropriate MindAgents) serve its needs in the near future, and some
of its juju toward boosting the LTIFund of Atoms that it thinks will serve its need in the future
(if they remain in RAM). Thus, when a Goal invokes a MindAgent (giving the MindAgent the
juju it needs to access Atoms and carry out its work), it should tell this MindAgent to put some
of its juju into LTIFunds and some into STIFunds.
EFTA00624239
23.4 Valuing LTI and STI in terms of a Single Currency
93
If a MindAgent receives a certain amount of juju each cycle, independently of what the
system Goals are explicitly telling it, then this should be viewed as reflecting an implicit goal
of "ambient cognition", and the balance of STI and LTI associated with this implicit goal must
be a system parameter.
In general, the trade-off between STI and LTI boils down to the weighting between near and
far future that is intrinsic to a particular Goal. Simplistically: if a Goal values getting processor
allocated to the right stuff immediately 25 times more than getting processor allocated to the
right stuff 20K cycles in the future, then it should he willing spend 25x more of its juju on
STI than on LT/20K cydee• (This simplistic picture is complicated a little by the relationship
between different time-scales. For instance, boosting LThok, q,de,(A) will have an indirect
effect of increasing the odds that A will still be in memory 20K cycles in the future.)
However, this isn't the whole story, because multiple Goals are setting the importance values
of the same set of Atoms. If M1 pumps all its juju into STI for certain Atoms, then M2 may
decide it's not worthwhile for it to bother competing with MI in the STI domain, and to spend
its juju on LTI instead.
Note that the current system doesn't allow a MA to change its mind about LTI allocations.
One can envision a system where a MindAgent could in January, pay juju to have Atom A
kept around for a year, but then change its mind in June 6 months later, and ask for some of
the money back. But this would require an expensive accounting procedure, keeping track of
how much of each Atom's LTI had been purchased by which MindAgent; so it seems a poor
approach.
A more interesting alternative would be to allow MA's to retain adjustable `reserve funds" of
juju. This would mean that a MindAgent would never see a purpose to setting LTI,,„. 5,02,4A)
instead of repeatedly setting LT10„c
unless a substantial transaction cost were incurred
with each transaction of adjusting an Atom's LTI. Introducing a transaction cost plus an ad-
justable per-MindAgent juju reserve fund, and LTI's on multiple time scales, would give the
LTI framework considerable flexibility. (To prevent MA's from hoarding their juju, one could
place a tax rate on reserve juju.)
The conversion rate between STI and LTI becomes an interesting matter; though it seems
not a critical one, since in the practical dynamics of the system it's juju that is used to produce
STI and LTI. In the current design there is no apparent reason to spread STI of one Atom to
LTI of another Atom, or convert the STI of an Atom into LTI of that same Atom, etc. - but
such an application might come up. (Fbr the rest of this paragraph, let's just consider LTI with
one time scale, for simplicity.) Each Goal will have its own preferred conversion rate between
STI and LTI, based on its own balancing of different time scales. But, each Goal will also have
a limited amount of juju, hence one can only trade a certain amount of STI for LT!, if one
is trading with a specific goal G. One could envision a centralized STI-for-LTI market where
different MA's would trade with each other, but this seems overcomplicated, at least at the
present stage.
As a simpler software design point, this all suggests a value for associating each Goal with a
parameter telling how much of its juju it wants to spend on STI versus Lit Or, more subtly,
how much of its juju it wants to spend on LTI on various time-scales. On the other hand, in a
simple ECAN implementation this balance may be assumed constant across all Goals.
EFTA00624240
94
23 Attention Allocation
23.5 Economic Attention Networks
Economic Attention Networks (ECANs) are dynamical systems based on the propagation of
STI and LTI values. They are similar in many respects to Hopfield nets, but are based on a
different conceptual foundation involving the propagation of amounts of (conserved) currency
rather than neural-net activation. Further, ECANs are specifically designed for integration with
a diverse body of cognitive processes as embodied in integrative AI designs such as CogPrime.
A key aspect of the CogPrime design is the imposition of ECAN structure on the CogPrime
AtomSpace.
Specifically, ECANs have been designed to serve two main purposes within CogPrime: to
serve as an associative memory for the network, and to facilitate effective allocation of the
attention of other cognitive processes to appropriate knowledge items.
An ECAN is simply a graph, consisting of un-typed nodes and links, and also "Hebbian" links
that may have types such as HebbianLink. InverseHebbianLink, or SymmetricHebbianLink.
Each node and link in an ECAN is weighted with two currency values, called STI (short-
term importance) and LTI (long-term importance); and each Hebbian link is weighted with a
probabilistic truth value.
The equations of an ECAN explain how the STI, LTI and Hebbian link weights values
get updated over time. As alluded to above, the metaphor underlying these equations is the
interpretation of STI and LTI values as (separate) artificial currencies. The fact that STI and
LTI are currencies means that, except in unusual instances where the ECAN controller decides to
introduce inflation or deflation and explicitly manipulate the amount of currency in circulation,
the total amounts of STI and LTI in the system are conserved. This fact makes the dynamics
of an ECAN dramatically different than that of an attractor neural network.
In addition to STI and LTI as defined above, the ECAN equations also contain the notion
of an Attentional Focus (AF), consisting of those Atoms in the ECAN with the highest STI
values (and represented by the sgh„ shom value in the above equations). These Atoms play a
privileged role in the system and, as such, are treated using an alternate set of equations.
23.5.1 Semantics of Hebbian Links
Conceptually, the probability value of a HebbianLink from A to B is the odds that if A is in
the AF, so is B; and correspondingly, the InverseHebbianLink from A to B is weighted with the
odds that if A is in the AF, then B is not. An ECAN will often be coupled with a "Forgetting"
process that removes low-LTI Atoms from memory according to certain heuristics. A critical
aspect of the ECAN equations is that Atoms periodically spread their STI and LTI to other
Atoms that connect to them via Hebbian and InverseHebbianLinks; this is the ECAN analogue
of activation spreading in neural networks.
Multiple varieties of HebbianLink may be constructed, for instance
• Asymmetric HebbianLinks, whose semantics are as mentioned above: the truth value of
Hebbiatthink A B denotes the probability that if A is in the AF, so is B
• Symmetric HebbianLinks, whose semantics are that: the truth value of SymmetricHebbian-
Link A B denotes the probability that if one of A or B is in the AF, both are
EFTA00624241
23.6 Dynamics of STI and LTI Propagation
95
It is also worth noting that one can combine ContextLinks with HebbianLinks and express
contextual association such that in context C, there is a strong HebbianLink between A and B.
23.5.2 Explicit and Implicit Hebbian Relations
In addition to explicit HebbianLinks, it can be useful to treat other links implicitly as Heb-
bianLinks. For instance, if ConceptNodes A and B are found to connote similar concepts, and
a SimilarityLink is formed between them, then this gives reason to believe that maybe a Sym-
metricHebbianLink between A and B should exist as well. One could incorporate this insight
in CogPrime in at least three ways:
• creating HebbianLinks paralleling other links (such as SimilarityLinks)
• adding "Hebbian weights" to other links (such as SimilarityLinks)
• implicitly interpreting other links (such as SimilarityLinks) as HebbianLinks
Further, these strategies may potentially be used together.
There are some obvious semantic relationships to be used in interpreting other link types im-
plicitly as HebbianLinks: for instance, Similarity maps into SynunetricHebbian. and Inheritance
A B maps into Hebbian A B. One may express these as inference rules, e.g.
SimilarityLink A B <tv_1>
I -
SymmetricHebbianLink A B <tv_2>
where tv2.8 = tvi.s. Clearly, tv2.c c tvi.c; but the precise magnitude of tv2.c must be deter-
mined by a heuristic formula. One option is to set tv2.c = crtui.c where the constant cc is set
empirically via data mining the System Activity Tables to be described below.
23.6 Dynamics of STI and LTI Propagation
We now get more specific about how some of these ideas are implemented in the currently
implemented ECAN subsystem of CogPrime. We'll discuss mostly sTi here because in the
current design LTI works basically the same way.
MindAgents send out stimulus to Atoms whenever they use them (or else, sometimes, just for
the purpose of increasing the Atom's STI); and before these stimulus values are used to update
the STI levels of the receiving Atom, they are normalized by: the total amount of stimulus sent
out by the MindAgent in that cycle, multiplied by the total amount of STI currency that the
MindAgent decided to spend in that cycle. The normalized stimulus is what has above been
called juju. This normalization preserves fairness among MA's, and conservation of currency.
(The reason "stimuli" exist, separately from STI, is that stimulus-sending needs to be very
computationally cheap, as in general it's done frequently by each MA each cycle, and we don't
want each action a MA takes to invoke some costly importance-updating computation.)
Then, Atoms exchange STI according to certain equations (related to HebbianLinks and
other links), and have their STI values updated according to certain equations (which involve,
among other operations, transferring STI to the "central bank").
EFTA00624242
96
23 Attention Allocation
23.6.1 ECAN Update Equations
The CogServer is understood to maintain a kind of central bank of STI and LTI funds. When
a non-ECAN MindAgent finds an Atom valuable, it sends that Atom a certain amount of
Stimulus, which results in that Atom's STI and LTI values being increased (via equations to be
presented below, that transfer STI and LTI funds from the CogServer to the Atoms in question).
Then, the ECAN ImportanceUpdating MindAgent carries out multiple operations, including
some that transfer STI and LTI funds from some Atoms bath to the CogServer.
There are multiple ways to embody this process equationally; here we briefly describe two
variants.
23.6.1.1 Definition and Analysis of Variant 1
We now define a specific set of equations in accordance with the ECAN conceptual framework
described abate. We define HsTi• = [81, • • • i sn] to be the vector of STI values, and C =
1
1CIL, • • • >elm
•
•
•
: • :
cia, • • • 'cm,J
t at the existence of a HebbianLink or InverseHebbianLink between A and B are mutually
[
911," • 19
exclusive possibilities. We also define CLTI = :. • .:
1
to be the matrix of LTI values
9nt, "
>Sinn
for each of the corresponding
We assume an updating scheme in which, periodically, a number of Atoms are allocated
Stimulus amounts, which causes the corresponding STI values to change according to the equa-
tions
to be the connection matrix of Hebbian probability values, where it is assumed
Vi : si = s e — rent + wages,
where rent and wages are given by
20..
1
rent =
(Rent) • max ( 0, icl"n
r "b"') , if si > 0
{
f
and
0,
if si > e >
rent = 0, ifs; ≤ e
{
(Wage)(StiMUIUS)
, if
p
i
1
wages =
(Veag
talitiliiS)
.r
a '
Ti_Er..1p,
,Il pi = u
where P = [pi, • • • ,p.), with pa E {0,1) is the cue pattern for the pattern that is to be retrieved.
All quantities enclosed in angled brackets are system parameters, and LTI updating is accom-
plished using a completely analogous set of equations.
EFTA00624243
23.6 Dynamics of STI and LTI Propagation
97
The changing STI values then cause updating of the connection matrix, according to the
"conjunction" equations. First define
=
resents nxSTri if Si
recentkiinSTi ' if
<
Next define
conj = Conjunction (si,sj) = norms x norm.'
and
c:j = (ConjDecay) conj + (1 — conj)cii.
Finally update the matrix elements by setting
= tip 4.; > 0
cif = ow
if
< 0
We are currently also experimenting with updating the connection matrix in accordance with
the equations suggested by Storkey (1997, 1998, 1999.)
A key property of these equations is that both wages paid to, and rent paid by, each node
are positively correlated to their STI values. That is, the more important nodes are paid more
for their services, but they also pay more in rent.
A fixed percentage of the links with the lowest LTI values is then forgotten (which corresponds
equationally to setting the LTI to 0).
Separately from the above, the process of Hebbian probability updating is carried out via a
diffusion process in which some nodes "trade" STI utilizing a diffusion matrix D, a version of
the connection matrix C normalized so that D is a left stochastic matrix. D acts on a similarly
scaled vector v, normalized so that v is equivalent to a probability vector of STI values.
The decision about which nodes diffuse in each diffusion cycle is carried out via a decision
function. We currently are working with two types of decision functions: a standard threshold
function, by which nodes diffuse if and only if the nodes are in the AF; and a stochastic decision
function in which nodes diffuse with probability th"h(shar'('' -PoEkm""17))+1, where shape and
FocusBoundary are parameters.
The details of the diffusion process are as follows. First, construct the diffusion matrix from
the entries in the connection matrix as follows:
If cif ≥ 0, then d i =
else, set do =
Next, we normalize the columns of D to make D a left stochastic matrix. In so doing, we ensure
that each node spreads no more than a (MaxSpread) proportion of its STI, by setting
n
if Edi > (MaxSpread) :
1=1
{
dy =
d„. x (It 87
1), for i.0 j
'
du = 1 — (MaxSpread)
EFTA00624244
98
23 Attention Allocation
else:
djj = 1— E
i = 1
i
Now we obtain a scaled STI vector v by setting
minSTI =
min
si and maxSTI =
max
si
— min STI
v,
max STI — min STI
The diffusion matrix is then used to update the node STIs
v'= Dv
and the STI values are resealed to the interval [minSTI,mfocSTII.
In both the rent and wage stage and in the diffusion stage, the total STI and LTI funds
of the system each separately form a conserved quantity: in the case of diffusion, the vector
v is simply the total STI times a probability vector. To maintain overall system funds within
homeostatic bounds, a mid-cycle tax and rent-adjustment can be triggered if nectsbary; the
equations currently used for this are
1. (Rent}
recent stimulus awarded. bettoreArupdatex (Wage)
recent
2. tax =f,, where x is the distance from the current AtomSpace bounds to the center of the
homeostatic range for AtomSpace funds;
3. Vi:si =
— tax
23.6.1.2 Investigation of Convergence Properties of Variant 1
Now we investigate some of the properties that the above ECAN equations display when we
use an ECAN defined by them as an associative memory network in the manner of a Hopfield
network.
We consider a situation where the ECAN is supplied with memories via a "training" phase
in which one imprints it with a series of binary patterns of the form P = [pi, • • • ,p„], with
E {0, 1). Noisy versions of these patterns are then used as cue patterns during the retrieval
process.
We obviously desire that the ECAN retrieve the stored pattern corresponding to a given cue
pattern. In order to achieve this goal, the ECAN must converge to the correct fixed point.
Theorem 23.1. For a given value of e in the STI rent calculation, there is a subset of hyperbolic
decision functions for which the ECAN dynamics converge to an attracting faxed point.
Proof Rent is zero whenever e ≤ s, ≤ recent2o
m' s11 , so we consider this case first. The updating
process for the rent and wage stage can then be written as f (s) = s + constant. The next stage
is governed by the hyperbolic decision function
EFTA00624245
23.6 Dynamics of STI and LTI Propagation
99
tanh (shape (s — FocusBotmdary)) -I- 1
9 (s)
2
The entire updating sequence is obtained by the composition (go f) (s), whose derivative is
then
(s. of)
sech2 (f (s)) • shape (1),
2
which has magnitude less than 1 whenever -2 < shape < 2. We next consider the case si
reeentigaxSTI > a The function f now takes the form
20
and we have
f (s)
s
log (20s/recentMaxSTI)
2
-F constant,
(so
sech2 (f (s)) • shape ( 1
1 \
2
2s)
which has magnitude less than 1 whenever Ishapei < I 2;
l• Choosing the shape parameter
to satisfy 0 < shape < min (2,
I 2*
I) then guarantees that
I(91 0 f )'I < 1. Finally, g o f maps
the closed interval [recentAlinStl,recentMazSTI) into itself, so applying the Contraction
Mapping Theorem completes the proof.
23.6.1.3 Definition and Analysis of Variant 2
The ECAN variant described above has performed completely acceptably in our experiments
so far; however we have also experimented with an alternate variant, with different convergence
properties. In Variant 2, the dynamics of the ECAN are specifically designed so that a certain
conceptually intuitive function serves as a Liaptmov function of the dynamics.
At a given time t, for a given Atom indexed i, we define two quantities: OUTi(t) = the total
amount that Atom i pays in rent and tax and diffusion during the time-t iteration of ECAN
; iNi(t) = the total amount that Atom i receives in diffusion, stimulus and welfare during the
time-t iteration of ECAN. Note that welfare is a new concept to be introduced below. We then
define DifFi(t) = MAO— OUTi(t) ; and define AVDIFF(t) as the average of DIFF;(t) over
all i in the ECAN.
The design goal of Variant 2 of the ECAN equations is to ensure that, if the parameters are
tweaked appropriately, AVDIFF can serve as a (deterministic or stochastic, depending on the
details) Lyapunov function for ECAN dynamics. This implies that with appropriate parameters
the ECAN dynamics will converge toward a state where AVDIFF=0, meaning that no Atom
is making any profit or incurring any loss. It must be noted that this kind of convergence is
not always desirable, and sometimes one might want the parameters set otherwise. But if one
wants the STI components of an ECAN to converge to some specific values, as for instance in
a classic associative memory application, Variant 2 can guarantee this easily.
In Variant 2, each ECAN cycle begins with rent collection and welfare distribution, which
occurs via collecting rent via the Variant 1 equation, and then performing the following two
steps:
EFTA00624246
100
23 Attention Allocation
• Step A: calculate X, defined as the positive part of the total amount by which AVDIFF
has been increased via the overall rent collection process.
• Step B: redistribute X to needy Atoms a S follows: For each Atom z, calculate the positive
part of OUT —IN, defined as deficit(z). Distribute X +e wealth among all Atoms z, giving
each Atom a percentage of X that is proportional to deficit(z), but never so much as to
cause OUT<IN for any Atom (the welfare being given counts toward IN). Here e > 0
ensures AVDIFF decrease; e = 0 may be appropriate if convergence is not required in a
certain situation.
Step B is the welfare step, which guarantees that rent collection will decrease AVDIFF. Step
A calculates the amount by which the rich have been made poorer, and uses this to make the
poor richer. In the case that the sum of deficit(z) over all nodes z is less than X, a mid-cycle
rent adjustment may be triggered, calculated so that step B will decrease AVDIFF. (Le. we cut
rent on the rich, if the poor don't need their money to stay out of deficit.)
Similarly. in each Variant 2 ECAN cycle, there is a wage-paying process, which involves the
wage-paying equation from Variant 1 followed by two steps. Step A: calculate Y, defined as the
positive part of the total amount by which AVDIFF has been increased via the overall wage
payment process. Step B: exert taxation based on the surplus Y as follows: For each Atom
z, calculate the positive part of IN— OUT, defined as surplus(z). Collect Y + el wealth from
all Atom z, collecting from each node a percentage of Y that is proportional to surplus(z), but
never so much as to cause IN < OUT for any node (the new STI being collected counts toward
OUT).
In case the total of surplus(z) over all nodes z is less than Y, one may trigger a mid-cycle
wage adjustment, calculated so that step B will decrease AVDIFF. I.e. we cut wages since there
is not enough surplus to support it.
Finally, in the Variant 2 ECAN cycle, diffusion is done a little differently, via iterating the
following process: If AVDIFF has increased during the diffusion round so far, then choose
a random node whose diffusion would decrease AVDIFF, and let it diffuse; if AVDIFF has
decreased during the diffusion round so far, then choose a random node whose diffusion would
increase AVDIFF, and let it diffuse. In carrying out these steps, we avoid letting the same node
diffuse twice in the same round. This algorithm does not let all Atoms diffuse in each cycle, but
it stochastically lets a lot of diffusion happen in a way that maintains AVDIFF constant. The
iteration may be modified to bias toward an average decrease in AVDIFF.
The random element in the diffusion step, together with the logic of the rent/welfare and
wage/tax steps, combines to yield the result that for Variant 2 of ECAN dynamics, AVDIFF is
a stochastic Lyapunov function. The details of the proof of this will be omitted but the outline
of the argument should be clear from the construction of Variant 2. And note that by setting the
e and el parameter to 0, the convergence requirement can be eliminated, allowing the network
to evolve more spontaneously as may be appropriate in some contexts; these parameters allow
one to explicitly adjust the convergence rate.
One may also derive results pertaining to the meaningfulness of the attractors, in various
special cases. For instance, if we have a memory consisting of a set M of m nodes, and we
imprint the memory on the ECAN by stimulating in nodes during an interval of time, then we
want to be able to show that the condition where precisely those m nodes are in the AF is a
fixed-point attractor. However, this is not difficult, because one must only show that if these m
nodes and none others are in the AF, this condition will persist.
EFTA00624247
23.7 Glocal Economic Attention Networks
101
23.6.2 ECAN as Associative Memory
We have carried out experiments gauging the performance of Variant 1 of ECAN as an as-
sociative memory, using the implementation of ECAN within CogPrime, and using both the
conventional and Storkey Hebbian updating fornmlas.
As with a Hopfield net memory, the memory capacity (defined as the number of memories
that can be retrieved from the network with high accuracy) depends on the sparsity of the
network, with denser networks leading to greater capacity. In the ECAN case the capacity also
depends on a variety of parameters of the ECAN equations, and the precise unraveling of these
dependencies is a subject of current research. However, one interesting dependency has already
been uncovered in our preliminary experimentation, which has to do with the size of the AF
versus the size of the memories being stored.
Define the size of a memory (a pattern being imprinted) as the number of nodes that are
stimulated during imprinting of that memory. In a classical Hopfield net experiment, the mean
size of a memory is usually around, say, .2-.5 of the number of neurons. In typical CogPrime
associative memory situations, we believe the mean size of a memory, will be one or two orders
of magnitude smaller than that, so that each memory occupies only a relatively small portion
of the overall network.
What we have found is that the memory capacity of an ECAN is generally comparable to
that of a Hopfield net with the same number of nodes and links, if and only if the ECAN
parameters are tuned so that the memories being imprinted can fit into the AF. That is, the
AF threshold or (in the hyperbolic case) shape parameter must be tuned so that the size of
the memories is not so large that the active nodes in a memory cannot stably fit into the AF.
This tuning may be done adaptively by testing the impact of different threshold/shape values
on various memories of the appropriate size; or potentially a theoretical relationship between
these quantities could be derived, but this has not been done yet. This is a reasonably satisfying
result given the cognitive foundation of ECAN: in loose terms what it means is that ECAN
works best for remembering things that fit into its focus of attention during the imprinting
process.
23.7 Glocal Economic Attention Networks
In order to transform ordinary ECANs into glocal ECANs, one may proceed in essentially the
same manner as with glocal Hopfield nets as discussed in Chapter 13 of Part 1. In the language
normally used to describe CogPrime, this would be termed a "map encapsulation" heuristic.
As with glocal Hopfield nets, one may proceed most simply via creating a fixed pool of nodes
intended to provide locally-representative keys for the maps formed as attractors of the network.
Links may then be formed to these key nodes, with weights and STI and LTI values adapted
by the usual ECAN algorithms.
EFTA00624248
102
23 Attention Allocation
23.7.1 Experimental Explorations
To compare the performance of glocal ECANs with glocal Hopfield networks in a simple context,
we ran experiments using ECAN in the manner of a Hopfield network. That is, a number of
nodes take on the equivalent role of the neurons that are presented patterns to be stored. These
patterns are imprinted by setting the corresponding nodes of active bits to have their STI within
the AF, whereas nodes corresponding to inactive bits of the pattern are below the AF threshold.
Link weight updating then occurs, using one of several update rules, but in this case the update
rule of ISV99I was used. Attention was spread using a diffusion approach by representing the
weights of Hebbian links between pattern nodes within a left stochastic Markov matrix, and
multiplying it by the vector of normalised STI values to give a vector representing the new
distribution of STI.
To explore the effects of key nodes on ECAN Hopfield networks, in roe08b1 we used the
palimpsest testing scenario of ISV99], where all the local neighbours of the imprinted pattern,
within a single bit change, are tested. Each neighbouring pattern is used as input to try and
retrieve the original pattern. If all the retrieved patterns are the same as the original (within
a given tolerance) then the pattern is deemed successfully retrieved and recall of the previous
pattern is attempted via its neighbours. The number of patterns this can repeat for successfully
is called the palimpsest storage of the network.
As an example, consider one simple experiment that was nm with recollection of 10 x 10
pixel pat tents (so, 100 nodes. each corresponding to a pixel in the grid), a Hebbian link density
of 30%, and with 1% of links being forgotten before each pattern is imprinted. The results
demonstrated that, when the mean palimpsest storage is calculated for each of 0, 1, 5 and 10
key nodes we find that the storage is 22.6, 22.4, 24.9, and 26.0 patterns respectively, indicating
that key nodes do improve memory recall on average.
23.8 Long-Term Importance and Forgetting
Now we turn to the forgetting process (carried out by the Forgetting MindAgent), which is
driven by LTI dynamics, but has its own properties as well.
Overall, the goal of the "forgetting" process is to maximize the total utility of the Atoms in
the AtomSpace throughout the future. The most basic heuristic toward this end is to remove
the Atoms with the lowest LTI, but this isn't the whole story. Clearly, the decision to remove
an Atom from RAM should depend on factors beyond just the LTI of the Atom. For example,
one should also take into account the expected difficulty in reconstituting the given Atom from
other Atoms. Suppose the system has the relations:
"dogs are animals"
"animals are cute"
"dogs are cute"
and the strength of the third relation is not dissimilar from what would be obtained by deduction
and revision from the first two relations and others in the system. Then, even if the system
judges it will be very useful to know dogs are cute in the future, it may reasonably choose to
remove dogs are cute from memory, anyway, because it knows it can be so easily reconstituted,
EFTA00624249
23.9 Attention Allocation via Data Mining on the System Activity Table
103
by a few inference steps for instance. Thus, as well as removing the lowest-LTI Atoms, the
Forgetting MindAgent should also remove Atoms meeting certain other criteria such as the
combination of:
• low STI
• easy reconstitutability in terms of other Atoms that have LTI not lass than its own
23.9 Attention Allocation via Data Mining on the System Activity
Table
In this section we'll discuss an object called the System Activity Table, which contains a number
of subtables recording various activities carried out by the various objects in the CogPrime
system. These tables may be used for sophisticated attention allocation processes, according
to an approach in which importance values and HebbianLink weight values are calculated via
direct data mining of a centralized knowledge store (the System Activity Table). This approach
provides highly accurate attention allocation but at the cost of significant computational effort.
The System Activity Table is actually a set of tables, with multiple components. The precise
definition of the tables will surely be adapted based on experience as the work with CogPrime
progresses; what is described here is a reasonable first approximation.
First, there is a MindAgent Activity Table, which includes, for each MindAgent in the system,
a table such as Table 23.1 (in which the time-points recorded are the last T system cycles, and
the Atom-lists recorded are lists of Handles for Atoms).
System
Cycle
Effort
Spent
?utopia°
Used
Atom Combo 1 Utilized
Atom Combo 2 Utilized
.. .
Now
3.3
4000
Atom21, Atom44
Atom 44, Atom 47, Atom 345 .. .
Now -1 0.4
6079
Atom123, Atom33
Atom 345
.. .
...
...
...
...
...
.. .
Table 23.1: Example MindAgent Table
The MindAgent's activity table records, for that MindAgent and for each system cycle, which
Atom-sets were acted on by that MindAgent at that point in time.
Similarly, a table of this nature must be maintained for each Task-type, e.g. InferenceTask,
MOSESCategorizationTask, etc. The Task tables are used to estimate Effort values for various
Tasks, which are used in the procedure execution process. If it can be estimated how much
spatial and temporal resources a Task is likely to use, via comparison to a record of previous
similar tasks (in the Task table), then a MindAgent can decide whether it is appropriate to
carry out this Task (versus some other one, or versus some simpler process not requiring a Task)
at a given point in time, a process to be discussed in a later chapter.
In addition to the MindAgent and Task-type tables, it is convenient if tables are maintained
corresponding to various goals in the system (as shown in Table ??), including the Ubergoals
but also potentially derived goals of high importance.
For each goal, at minimum, the degree of achievement of the goal at a given time must be
recorded. Optionally, at each point in time, the degree of achievement of a goal relative to some
EFTA00624250
104
23 Attention Allocation
System
Cycle
Total Achievement Achievement for Atom44
Achievement
for set
{Atom44, Atom 233}
...
Now
.8
.4
.5
...
Now-1 .9
.5
.55
...
...
...
•-•
•-•
...
Table 23.2: Example Goal Table
particular Atoms may be recorded. Typically the list of Atom-specific goal-achievements will
be short and will be different for different goals and different time points. Some goals may be
applied to specific Atoms or Atom sets, others may only be applied more generally.
The basic idea is that attention allocation and credit assignment may be effectively carried
out via datamining on these tables.
23.10 Schema Credit Assignment
And, how do we apply a similar approach to clarifying the semantics of schema credit assign-
ment?
From the above-described System Activity Tables, one can derive information of the form
Achieve(G,E,T)
"Goal G was achieved to extent E at time T"
which may be grounded as, for example:
Similarity
E
ExOut
GetTruthValue
Evaluation
atTime
HypLink G
and more refined versions such as
Achieve(G,E,T,A,P)
"Goal G was achieved to extent E using
Atoms A (with parameters P) at time T"
Enact(S,I,ST_13,0,3T_2$1
"Schema S was enacted on inputs I
at time $T_13, producing outputs 0
at time $T 2$"
The problem of schema credit assignment is then, in essence: Given a goal G and a distribution
of times V. figure out what schema to enact in order to cause G's achievement at some time in
the future, where the desirability of times is weighted by V.
The basic method used is the learning of predicates of the form
ImplicationLink
F(C,
Pn)
where
EFTA00624251
23.10 Schema Credit Assignment
105
• the P1 are Enact() statements in which the T1 and T2 are variable, and the S, I and 0 may
be concrete or variable
• C is a predicate representing a context
• g is an Achieve() statement, whose arguments may be concrete or abstract
• F is a Boolean function
Typically, the variable expressions in the T1 and T2 positions will be of the form T -F offset,
where offset is a constant value and T is a time value representing the time of inception of the
whole compound schema. T may then be defined as To — offsety where offset, is a constant
value and 7.0 is a variable denoting the time of achievement of the goal.
In CogPrime, these predicates may be learned by a combination of statistical pattern mining,
PLN inference and MOSES or hill-climbing procedure learning.
The choice of what action to take at a given point in time is then a probabilistic decision.
Based on the time-distribution 73 given, the system will know a certain number of expressions
C = F(C, Ph ..., P4 of the type described above. Each of these will be involved in an Implica-
tionLink with a certain estimated strength. It may select the "compound schema" C with the
highest strength.
One might think to introduce other criteria here, e.g. to choose the schema with the highest
strength but the lowest cost of execution. However, it seems better to include all pertinent
criteria in the goal, so that if one wants to consider cost of execution, one assumes the existence
of a goal that incorporates cost of execution (which may be measured in multiple ways, of
course) as part of its internal evaluation function.
Another issue that arises is whether to execute multiple C simultaneously. In many cases
this won't be possible because two different C's will contradict each other. It seems simplest
to assume that C's that can be fused together into a single plan of action, are presented to
the schema execution process as a single fused C. In other words, the fusion is done during the
schema learning process rather than the execution process.
A question emerges regarding how this process deals with false causality, e.g. with a schema
that, due to the existence of a common cause, often happens to occur immediately prior to the
occurrence of a given goal. For instance, roosters crowing often occurs prior to the sun rising.
This matter is discussed in more depth in the PLN book and The Hidden Pattern; but in brief,
the answer is: In the current approach, if roosters crowing often causes the sun to rise, then if
the system wants to cause the sun to rise, it may well cause a rooster to crow. Once this fails,
then the system will no longer hold the false belief, and afterwards will choose a different course
of action. Furthermore, if it holds background knowledge indicating that roosters crowing is
not likely to cause the sun to rise, then this background knowledge will be invoked by inference
to discount the strength of the ImplicationLink pointing from rooster-crowing to sun-rising, so
that the link will never be strong enough to guide schema execution in the first place.
The problem of credit assignment thus becomes a problem of creating appropriate heuristics
to guide inference of ImplicationLinks of the form described above. Assignment of credit is then
implicit in the calculation of truth values for these links. The difficulty is that the predicates F
involved may be large and complex.
EFTA00624252
106
23 Attention Allocation
23.11 Interaction between ECANs and other CogPrime Components
We have described above a number of interactions between attention allocation and other
aspects of CogPrime; in this section we gather a few comments on these interactions, and some
additional ones.
23.11.1 Use of PLN and Procedure Learning to Help ECAN
MOSES or hillclimbing may be used to help mine the SystemActivityTable for patterns of
usefulness, and create HebbianLinIcs reflecting these patterns.
PLN inference may be carried out on HebbianLinks by treating (HebbianLink A B) as a
virtual predicate evaluation relationship, i.e. as
EvaluationLink Rebbianpredicate (A, B)
PLN inference on HebbianLinks may then be used to update node importance values, because
node importance values are essentially node probabilities corresponding to HebbianLinks. And
similarly, MindAgent-relative node importance values are node probabilities corresponding to
MindAgent-relative HebbianLinks.
Note that conceptually. the nature of this application of PLN is different from most other
uses of PLN in CogPrime. Here, the purpose of PLN is not to draw conclusions about the
outside world, but rather about what the system should focus its resources on in what context.
PLN, used in this context, effectively constitutes a nonlinear-dynamical iteration governing the
flow of attention through the CogPrime system.
Finally, inference on HebbianLinks leads to the emergence of maps, via the recognition of
clusters in the graph of HebbianLinks.
23.11.2 Use of ECAN to Help Other Cognitive Processes
First of all, associative-memory functionality is directly important in CogPrime because it is
used to drive concept creation. The CogPrime heuristic called "map formation" creates new
Nodes corresponding to prominent attractors in the ECAN, a step that (according to our
preliminary results) not only increases the memory capacity of the network beyond what can
be achieved with a pure ECAN but also enables attractors to be explicitly manipulated by PLN
inference.
Equally important to associative memory is the capability of ECANs to facilitate effective
allocation of the attention of other cognitive processes to appropriate knowledge items (Atoms).
For example, one key role of ECANs in CogPrime is to guide the forward and backward chaining
processes of PLN (Probabilistic Logic Network) inference. At each step, the PLN inference
chainer is faced with a great number of inference steps (branches) from which to choose; and
a choice is made using a statistical "bandit problem" mechanism that selects each possible
inference step with a probability proportional to its expected "desirability." In this context,
there is considerable appeal in the heuristic of weighting inference steps using probabilities
EFTA00624253
23.12 MindAgent Importance and Scheduling
107
proportional to the STI values of the Atoms they contain. One thus arrives at a combined
PLN/ECAN dynamic as follows:
1. An inference step is carried out, involving a choice among multiple possible inference steps,
which is made using STI-based weightings (and made among Atoms that LTI weightings
have deemed valuable enough to remain in RAM)
2. The Atoms involved in the inference step are rewarded with STI and LTI proportionally
to the utility of the inference step (how much it increases the confidence of Atoms in the
system's memory)
3. The ECAN operates, and multiple Atom's importance values are updated
4. Return to Step 1 if the inference isn't finished
An analogous interplay may occur between ECANs and MOSES.
It seems intuitively clear that the same attractor-convergence properties highlighted in the
above analysis of associative-memory behavior, will also be highly valuable for the application
of ECANs to attention allocation. If a collection of Atoms is often collectively useful for some
cognitive process (such as PLN), then the ascnriative-memory-type behavior of ECANs means
that once a handful of the Atoms in the collection are found useful in a certain inference
process, the other Atoms in the collection will get their STI significantly boosted, and will be
likely to get chosen in subsequent portions of that same inference process. This is exactly the
sort of dynamics one would like to see occur. Systematic experimentation with these interactions
between ECAN and other CogPrime processes is one of our research priorities going forwards.
23.12 MindAgent Importance and Scheduling
So far we have discussed economic transactions between Atoms and Atoms, and between Atoms
and Units. MindAgents have played an indirect role, via spreading stimulation to Atoms which
causes them to get paid wages by the Unit. Now it is time to discuss the explicit role of
MindAgents in economic transactions. This has to do with the integration of economic attention
allocation with the Scheduler that schedules the core MindAgents involved in the basic cognitive
cycle.
This integration may be done in many ways, but one simple approach is:
1. When a MindAgent utilizes an Atom, this results in sending stimulus to that Atom. (Note
that we don't want to make MindAgents pay for using Atoms individually; that would
penalize MA's that use more Atoms, which doesn't really make much sense.)
2. MindAgents then get currency from the Lobe (as defined in Chapter 19) periodically, and
get extra currency based on usefulness for goal achievement as determined by the credit
assigmnent process. The Scheduler then gives more processor time to MindAgents with
more STI.
3. However, any MindAgent with LTI above a certain minimum threshold will get some min-
imum amount of processor time (i.e. get scheduled at least once each N cycles).
As a final note: In a multi-Lobe Unit, the Unit may use the different LTI values of MA's in
different Lobes to control the distribution of MA's among Lobes: e.g. a very important (LTI)
MA might get cloned across multiple Lobes.
EFTA00624254
108
23 Attention Allocation
23.13 Information Geometry for Attention Allocation
Appendix ?? outlines some very broad ideas regarding the potential utilization of information
geometry and related ideas for modeling cognition. In this section, we present sonic more con-
crete and detailed experiments inspired by the same line of thinking. We model CogPrime's
Economic Attention Networks (ECAN) component using information geometric language, and
then use this model to propose a novel information geometric method of updating ECAN net-
works (based on an extension of Amari's ANGL algorithm). Tests on small networks suggest
that information-geometric methods have the potential to vastly improve ECAN's capability to
shift attention from current preoccupations to desired preoccupations. However, there is a high
computational cost associated with the simplest implementations of these methods, which has
prevented us from carrying out large-scale experiments so far. We are exploring the possibility
of circumventing these issues via using sparse matrix algorithms on GPUs.
23.13.1 Brief Review of Information Geometry
"Information geometry" is a branch of applied mathematics concerned with the application of
differential geometry to spaces of probability distributions. In IG1111 we have suggested some
extensions to traditional information geometry aimed at allowing it to better model general
intelligence. However for the concrete technical work in this Chapter, the traditional formulation
of information geometry will suffice.
One of the core mathematical constructs underlying information geometry, is the Fisher
Information, a statistical quantity which has a a variety of applications ranging far beyond
statistical data analysis, including physics [Fri98J, psychology and Al LAW)]. Put simply, Fl
is a formal way of measuring the amount of information that an observable random variable
X carries about an unknown parameter 8 upon which the probability of X depends. Fl forms
the basis of the Fisher-Rao metric, which has been proved the only Riemannian metric on
the space of probability distributions satisfying certain natural properties regarding invariance
with respect to coordinate transformations. Typically 8 in the Fl is considered to be a real
multidimensional vector; however, IDab00I has presented a Fl variant that imposes basically
no restrictions on the form of O. Here the multidimensional Fl will suffice, but the more gen-
eral version is needed if one wishes to apply Fl to AGI more broadly, e.g. to declarative and
procedural as well as attentional knowledge.
In the set-up underlying the definition of the ordinary finite-dimensional Fisher information,
the probability function for X, which is also the likelihood function for 8 E Ft", is a function
f(X;6); it is the probability mass (or probability density) of the random variable X conditional
on the value of 0. The partial derivative with respect to 81 of the log of the likelihood function
is called the score with respect to (h. Under certain regularity conditions, it can be shown that
the first moment of the score is 0. The second moment is the Fisher information:
= rx(e)i = E [((
a
— In f(X;6)))
2
10]
88;
where, for any given value of 0i, the expression El..10) denotes the conditional expectation over
values for X with respect to the probability function f(X; 0) given 0. Note that 0 ≤ i(0); <
EFTA00624255
23.13 Information Geometry for Attention Allocation
109
co. Also note that, in the usual case where the expectation of the score is zero, the Fisher
information is also the variance of the score.
One can also look at the whole Fisher information matrix
8Inf(X,0)31n1(X,0)\
2(9)ic
E[(
)
aej
which may be interpreted as a metric gij, that provably Ls the only "intrinsic" metric on prob-
ability distribution space. In this notation we have z(6)i = 1(0);,i.
Dabak IDal)991 has shown that the geodesic between two parameter vectors 0 and Or is given
by the exponential weighted curve (y(t)) (x) - fig:E.:4414 y, under the weak condition
that the log-likelihood ratios with respect to f(X, 9) and f(X, 9') are finite. Also, along this
sort of curve, the sum of the Kullback-Leibler distances between B and 0', known as the J-
divergence, equals the integral of the Fisher information along the geodesic connecting 0 and 6v.
This suggests that if one is attempting to learn a certain parameter vector based on data, and one
has a certain other parameter vector as an initial value, it may make sense to use algorithms that
try to follow the Fisher-Rao geodesic between the initial condition and the desired conclusion.
This is what Amari IA ma851 IA NO011 calls "natural gradient" based learning, a conceptually
powerful approach which subtly accounts for dependencies between the components of 0.
23.13.2 Information-Geometric Learning for Recurrent Networks:
Extending the ANGL Algorithm
Now we move on to discuss the practicalities of information-geometric learning within Cog-
Prime's ECAN component. As noted above, Amari lAma85, ANOI introduced the natural
gradient as a generalization of the direction of steepest descent on the space of loss functions of
the parameter space. Issues with the original implementation include the requirement of calcu-
lating both the Fisher information matrix and its inverse. To resolve these and other practical
considerations, Amari tAma981 proposed an adaptive version of the algorithm, the Adaptive
Natural Gradient Learning (ANGL) algorithm. Park, Amari, and Fukumizu IPA FOOI extended
ANGL to a variety of stochastic models including stochastic neural networks, multi-dimensional
regression, and classification problems.
In particular, they showed that, assuming a particular form of stochastic feedforward neu-
ral network and under a specific set of assumptions concerning the form of the probability
distributions involved, a version of the Fisher information matrix can be written as
G(0). E4 [(q1E
[OH (V Hi.
T
Although Park et al considered only feedforward neural networks, their result also holds
for more general neural networks, inc tiding the ECAN network. What is important is the
decomposition of the probability distribution as
P (Ylx; 0) = 11 n (yi - Ei (x,1))
1=1
EFTA00624256
110
23 Attention Allocation
where
Y = H(x; 0)+4, Y = (h, • • • ,YL)T, H = (HI, • • • ,HL)T,
(41,- • • ,eL)T,
where
is added noise. If we assume further that each r, has the same form as a Gaussian
distribution with zero mean and standard deviation a, then the Fisher information matrix
simplifies further to
GOO=
(V1/)1 .
The adaptive estimate for dt-:, is given by
= (1+ ei)6T1 — ELOTIVHOTIVH)T.
and the loss function for our model takes the form
t(x, y; 0) = — E log r(yi — Iii(x, O)).
i.1
The learning algorithm for our connection matrix weights B is then given by
83+3 = B3 — MOT 1V1(00-
23.13.3 Information Geometry for Economic Attention Allocation: A
Detailed Example
Graph 1: Sum of Squares of Errors
versus Number of Nodes
SOW
•
7002
4010
sour
4000
3000
4- KAM
2000
•••••• &M.N.&
1000
0
0
100
NO
300
400
NO
Number of Nods
Fig. 23.3: Results from Experiment 1
We now present the results of a series of small-scale, exploratory experiments comparing
the original ECAN process running alone with the ECAN process coupled with ANGL. We
are interested in determining which of these two lines of processing result in focusing attention
more accurately.
EFTA00624257
23.13 Information Geometry for Attention Allocation
111
The experiment started with base patterns of various sizes to be determined by the two
algorithms. In the training stage, noise was added, generating a number of instances of noisy
base patterns. The learning goal is to identify the underlying base patterns from the noisy
patterns as this will identify how well the different algorithms can focus attention on relevant
versus irrelevant nodes.
Next, the ECAN process was run, resulting in the determination of the connection matrix
C. In order to apply the ANGL algorithm, we need the gradient, VH, of the ECAN training
process, with respect to the input x. While calculating the connection matrix C, we used Monte
Carlo simulation to simultaneously calculate an approximation to VH.
Graph 2: Sum of Squares of Errors
versus Training Noise (16 nodes)
120
100
00
60
ea
•
r
-
0
r
05
I
15
2
2.5
Imastandad Oittlailan
3
_r ECM
e
ECM...Mt
Fig. 23.4: Results from Experiment 2
After ECAN training was completed, we bifurcated the experiment. In one branch, we ran
fuzzed cue patterns through the retrieval process. In the other, we first applied the ANGL
algorithm. optimizing the weights in the connection matrix, prior to running the retrieval process
on the same fuzzed cue patterns. At a constant value of a = 0.8 we ran several samples through
each branch with pattern sizes of 4 x 4, 7 x 7, 10 x 10, 15 x 15, and 20 x 20. The results are
shown in Figure 23.3. We also ran several experiments comparing the sum of squares of the
errors to the input training noise as measured by the value of a.; see Figures 23.4 and ??.
These results suggest two major advantages of the ECAN+ANGL combination compared
to ECAN alone. Not only was the performance of the combination better in every trial, save
for one involving a small number of nodes and little noise, but the combination clearly scales
significantly better both as the number of nodes increases, and as the training noise increases.
EFTA00624258
Chapter 24
Economic Goal and Action Selection
24.1 Introduction
A significant portion of CogPrime's dynamics is explicitly goal-driven - that is, based on trying
(inasmuch as passible within the available resources) to figure out which actions will best help
the system achieve its goals, given the current context. A key aspect of this explicit activity is
guided by the process of "goal and action selection" - prioritizing goals, and then prioritizing
actions based on these goals. We have already outlined the high-level process of action selection,
in Chapter 22 above. Now we dig into the specifics of the process, showing how action selection is
dynamically entwined with goal prioritization, and how both processes are guided by economic
attention allocation as described in Chapter 23.
While the basic structure of CogPrime's action selection aspect is fairly similar to MicroPsi
(due to the common foundation in Dorner's Psi model), the dynamics are less similar. MicroPsi's
dynamics are a little closer to being a formal neural net model, whereas ECAN's economic
foundation tends to push it in different directions. The CogPrime goal and action selection
design involves some simple simulated financial mechanisms, building on the economic metaphor
of ELAN, that are different from, and more complex than, anything in MicroPsi.
The main actors (apart from the usual ones like the AtomTable, economic attention alloca-
tion, etc.) in the tale to be told here are as follows:
• Structures:
- UbergoalPool
- ActiveSchemaPool
• MindAgents:
- GoalBasedSchemaSelection
- GoalBasedSchemaLearning
- GoalAttentionAllocation
- FeasibilityUpdating
- SchemaActivation
The Ubergoal Pool contains the Atoms that the system considers as top-level goals. These
goals must be treated specially by attention allocation: they must be given funding by the Unit
so that they can use it to pay for getting themselves achieved. The weighting among different
113
EFTA00624260
114
24 Economic Coal and Action Selection
top-level goals is achieved via giving them different amounts of currency. STICurrency is the key
kind here, but of course ubergoals must also get some LTICurrency so they won't be forgotten.
(Inadvertently deleting your top-level supergoals from memory is generally considered to be a
bad thing ... it's in a sense a sort of suicide...)
24.2 Transfer of STI "Requests for Services" Between Goals
Transfer of "attentional funds" from goals to subgoals, and schema modules to other schema
modules in the same schema, take place via a mechanism of promises of funding (or 'requests
for service,' to be called 'RFS's' from here on). This mechanism relies upon and interacts
with ordinary economic attention allocation but also has special properties. Note that we will
sometimes say that an Atom `tissues" an RFS or "transfers" currency while what we really mean
is that some MindAgent working on that Atom issues an RFS or transfers currency.
The logic of these RFS's is as follows. If agent A issues an RFS of value x to agent B, then
1. When B judges it appropriate, B may redeem the note and ask A to transfer currency of
value x to B.
2. A may withdraw the note from B at any time.
(There is also a little more complexity here, in that we will shortly introduce the notion of
RFS's whose value is defined by a set of constraints. But this complexity does not contradict
the two above points.) The total value of the of RFS's possessed by an Atom may be referred
to as its "promise."
A rough schematic depiction of this RFS process is given in Figure 24.1.
Now we explain how RFS's may be passed between goals. Given two predicates A and B,
if A is being considered as a goal, then B may be considered as a subgoal of A (and A the
supergoal of B) if there exists a Link of the form
Predictivelmplication
B A
I.e., achieving B may help to achieve A. Of course, the strength of this link and the temporal
characteristics of this link are important in terms of quantifying how strongly and how usefully
B is a subgoal of A.
Supergoals (not only top-level ones, aka ubergoals) allocate RFS's to subgoals as follows.
Supergoal A may issue a RFS to subgoal B if it is judged that achievement (i.e., predicate
satisfaction) of B implies achievement of A. This may proceed recursively: subgoals may allocate
RFS's to subsubgoaLs according to the same justification.
Unlike actual currency, RFS's are not conserved. However, the actual payment of real cur-
rency upon redemption of RFS's obeys the conservation of real currency. This means that agents
need to be responsible in issuing and withdrawing RFS's. In practice this may be ensured by
having agents follow a couple simple rules in this regard.
1. If B and C are two alternatives for achieving A, and A has x units of currency, then A may
promise both B and C x units of currency. Whoever asks for a redemption of the promise
first, will get the money, and then the promise will be rescinded from the other one.
2. On the other hand, if the achievement of A requires both B and C to be achieved, then B
and C may be granted RFS's that are defined by constraints. If A has x units of currency,
then B and C receive an RFS tagged with the constraint (B+C<=x). This means that in
EFTA00624261
24.2 Ttansfer of STI "Requests for Services" Between Goals
115
CENTRAL BANK
(ATOMSPACE)
WO
PIM
APPROPRIATE
SOCIAL
INTERACTION
MS $10
ASK
SOMEONE
RFS $60
ND
SOMEONE
TO ASK
RFS $20
(
WU FOR
ATTINTION
$100
MAINTAIN
APPROPRIATE
ENERGY
GET
IFS
BATTERY
REMEMBER
WHERE I SAW
ONE
no
FIND
ELECTRICAL
OUTLET
WWWRMWW0
AND !URDU
RISS.°
SEEK
NOVELTY
USER-
GOALS
SUB-
GOALS
Fig. 24.1: The RFS Propagation Process. An illustration of the process via which RFS's
propagate from goals to abstract procedures, and finally mast get cashed out to pay for the
execution of actual concrete procedures that are estimated relatively likely to lead to goal
fulfillment.
order to redeem the note, either one of B or C mast confer with the other one, so that they
can simultaneously request constraint-consistent amounts of money from A.
As an example of the role of constraints, consider the goal of playing fetch successfully (a
subgoal of "get reward").... Then suppose it is learned that:
ImplicationLink
Sequent ialAND
get ball
deliver ball
play_fetch
where SequentielAND A B is the conjunction of A and B but with B occurring after A in time.
Then, if play_fetch has $10 in STICurrency, it may know it has $10 to spend on a combination
EFTA00624262
116
24 Economic Coal and Action Selection
of get_ball and deliver_ball. In this case both get_ball and deliver_ball would be given RFS's
labeled with the contraint:
RES.get_ball + RFS.deliver_ball
The issuance of RFS's embodying constraints is different from (and generally carried out
prior to) the evaluation of whether the constraints can be fulfilled.
An ubergoal may rescind offers of reward for service at any time. And, generally, if a subgoal
gets achieved and has not spent all the money it needed, the supergoal will not offer any more
funding to the subgoal (until/unless it needs that subgoal achieved again).
As there are no ultimate sources of RFS in OCP besides ubergoals, promise may be considered
as a measure of "goal-related importance."
Transfer of RFS's among Atoms is carried out by the GoalAttentionAllocation MindAgent.
24.3 Feasibility Structures
Next, there is a numerical data structure associated with goal Atoms, which is called the
feasibility structure. The feasibility structure of an Atom G indicates the feasibility of achieving
G as a goal using various amounts of effort. It contains triples of the form (t,p,E) indicating
the truth value t of achieving goal G to degree p using effort E. Feasibility structures mist be
updated periodically, via scanning the links coining into an Atom G; this may be done by a
FeasibilityUpdating MindAgent. Feasibility may be calculated for any Atom G for which there
are links of the form:
Implication
Execution S
G
for some S. Once a schema has actually been executed on various inputs, its cost of execution
on other inputs may be empirically estimated. But this is not the only case in which feasibility
may be estimated. For example, if goal G inherits from goal G1, and most children (e.g. subgoals)
of G1 are achievable with a certain feasibility, then probably G is achievable with a similar
feasibility as well. This allows feasibility estimation even in cases where no plan for achieving G
yet exists, e.g. if the plan can be produced via predicate schematization, but such schematization
has not yet been carried out.
Feasibility then connects with importance as follows. Important goals will get more STICur-
rency to spend, thus will be able to spawn more costly schemata. So, the GoalBasedSchemaSe-
lection MindAgent, when choosing which schemata to push into the ActiveSchemaPool, will be
able to choose more costly schemata corresponding to goals with more STICurrency to spend.
24.4 Goal Based Schema Selection
Next, the GoalBa.sedSchemaSelection (GBSS) selects schemata to be placed into the Ac-
tiveSchetnaPool. It does this by choosing goals G, and then choosing schemata that are alleged
to be useful for achieving these goals. It chooses goals via a fitness function that combines
promise and feasibility. This involves solving an optimization problem: figuring out how to
EFTA00624263
24.4 Coal Based Schema Selection
117
maximize the odds of getting a lot of goal-important stuff done within the available amount
of (memory and space) effort. Potentially this optimization problem can get quite subtle, but
initially some simple heuristics are satisfactory. (One subtlety involves handling dependencies
between goals, as represented by constraint-bearing RFS's.)
Given a goal, the GBSS MindAgent chooses a schema to achieve that goal via the heuristic
of selecting the one that maximizes a fitness function balancing the estimated effort required
to achieve the goal via executing the schema, with the estimated probability that executing the
schema will cause the goal to be achieved.
When searching for schemata to achieve G. and estimating their effort, one factor to be taken
into account is the set of schemata already in the ActiveSchemaPool. Some schemata S may
simultaneously achieve two goals; or two schemata achieving different goals may have significant
overlap of modules. In this case G may be able to get achieved using very little or no effort
(no additional effort. if there is already a schema S in the ActiveSchemaPool that is going to
cause G to be achieved). But if G "decide?' it can be achieved via a schema S already in the
ActiveSchemaPool, then it should still notify the ActiveSchemaPool of this, so that G can be
added to S's index (see below). If the other goal Cl that placed S in the ActiveSchemaPool
decides to withdraw S, then S may need to hit up Cl for money, in order to keep itself in the
ActiveSchemaPool with enough funds to actually execute.
24.4.1 A Game-Theoretic Approach to Action Selection
Min Jiang has observed that, mathematically, the problem of action selection (represented in
CogPrime as the problem of goal-based schema selection) can be modeled in terms of game
theory, as follows:
• the intelligent agent is one player, the world is another player
• the agent's model of the world lets it make probabilistic predictions of how the world may
respond to what the agent does (i.e. to estimate what mixed strategy the world is following,
considering the world as a game player)
• the agent itself chooses schema probabilistically, so it's also following a mixed strategy
• so, in principle the agent can choose schema that it thinks will lead to a mixed Nash
But the world's responses are very high-dimensional, which means that finding a mixed Nash
equilibrium even approximately is a very hard computational problem. Thus, in a sense. the crux
of the problem seems to come down to feature identification. If the world's response (real or
predicted) can be represented as a low-dimensional set of features, then these features can be
considered as the world's "move" in the game ... and the game theory problem becomes tractable
via approximation schemes. But without the reduction of the world to a low-dimensional set of
features, finding the mixed Nash equilbritun even approximately will not be computationally
tractable...
Some Al theorists would argue that this division into "feature identification" versus "action
selection" is unnecessarily artificial; for instance, Hawkins 11113011 or Arel [ARC09bl might sug-
gest to use a single hierarchical neural network to do both of them. But the brain after all
t in game theory, a Nash equilibrium is when no player can do better by unilaterally changing its strategy
EFTA00624264
118
24 Economic Coal and Action Selection
contains many different regions, with different architectures and dynamics.... In the visual cor-
tex, it seems that feature extraction and object classification are done separately. And it seems
that in the brain, action selection has a lot to do with the basal ganglia, whereas feature extrac-
tion is done in the cortex. So the neural analogy provides some inspiration for an architecture
in which feature identification and action selection are separated.
There is literature discussing numerical methods for calculating approximate Nash equilibria;
however, this is an extremely tricky topic in the CogPrime context because action selection must
generally be done in real-time. Like perception processing, this may be an area calling for the
use of parallel processing hardware. For instance, a neural network algorithm for finding mixed
Nash equilibria could be implemented on a GPU supercomputer, enabling rapid real-time action
selection based on a reduced-dimensionality model of the world produced by intelligent feature
identification.
Consideration of the application of game theory in this context brings out an important
point, which is that to do reasonably efficient and intelligent action selection, the agent needs
some rapidly-evaluable model of the world, i.e. some way to rapidly evaluate the predicted
response of the world to a hypothetical action by the agent. In the game theory
approach (or any other sufficiently intelligent approach), for the agent to evaluate fitness of a
schema-set S for achieving certain goals in a certain context, it has to (explicitly or implicitly)
estimate
• how the world will respond if the agent does S
• how the agent could usefully respond to the world's response (call this action-set S1)
• how the world will respond to the agent doing Si
• etc.
and so to rapidly evaluate the fitness of S, the agent needs to be able to quickly estimate how
the world will respond. This may be done via simulation, or it may be done via inference (which
however will rarely be fast enough, unless with a very accurate inference control mechanism),
or it may be done by learning some compacted model of the world as represented for instance
in a hierarchical neural network.
24.5 SchemaActivation
And what happens with schemata that are actually in the ActiveSchemapool? Let as assume
that each of these schema is a collection of modules (subprograms), connected via Activation-
Links, which have semantics: (ActivationLink A B) means that if the schema that placed module
A in the schema pool is to be completed, then after A is activated, B should he activated. (We
will have more to say about schemata, and their modtdarization, in Chapter 25.)
When a goal places a schema in the ActiveSchemaPool, it grants that schema an RFS equal
in value to the total or some fraction of the promissory+real currency it has in its possession.
The heuristics for determining how much currency to grant may become sophisticated; but
initially we may just have a goal give a schema all its promissory currency; or in the case of a
top-level supergoal, all its actual currency.
When a module within a schema actually executes, then it must redeem some of its promis-
sory currency to turn it into actual currency, because executing costs money (paid to the Lobe).
EFTA00624265
24.6 ConlBasedScheranLearning
119
Once a schema is done executing, if it hasn't redeemed all its promissory currency, it gives the
remainder back to the goal that placed it in the ActiveSchemaPool.
When a module finishes executing, it passes promissory currency to the other modules to
which it points with ActivationLinks.
The network of modules in the ActiveSchemaPool is a digraph (whose links are Activation-
Links), because sonic modules may be shared within different overall schemata. Each module
must be indexed via which schemata contain it, and each schema must be indexed via which
goal(s) want it in the ActiveSchemaPool.
24.6 GoalBasedSchemaLearning
Finally, we have the process of trying to figure out how to achieve goals, i.e. trying to learn
links between ExecutionLinks and goals G. This process should he focused on goals that have a
high importance but for which feasible achievement-methodologies are not yet known. Predicate
schematization is one way of achieving this; another is MOSES procedure evolution.
EFTA00624266
Chapter 25
Integrative Procedure Evaluation
25.1 Introduction
Procedural knowledge must be learned, an often subtle and difficult process - but it must also
be enacted. Procedure enaction is not as tricky a topic as procedure learning, but still is far
from trivial, and involves the real-time interaction of procedures, during the course of execution,
with other knowledge. In this brief chapter we explain how this process may be most naturally
and flexibly carried out, in the context of CogPrime's representation of procedures as programs
("Combo trees").
While this may seem somewhat of a "mechanical", implementation-level topic, it also involves
some basic conceptual points, on which CogPrime as an AGI design does procedure evaluation
fundamentally differently from narrow-AI systems or conventional programming language in-
terpreters. Basically, what makes CogPrime Combo tree evaluation somewhat subtle is due to
the interfacing between the Combo evaluator itself and the rest of the CogPrime system.
In the CogPrime design, Procedure objects (which contain Combo trees, and are associated
with ProcedureNodes) are evaluated by ProcedureEvaluator objects. Different ProcedureEvalu-
ator objects may evaluate the same Combo tree in different ways. Here we explain these various
sorts of evaluation - how they work and what they mean.
25.2 Procedure Evaluators
In this section we will mention three different ProcedureEvaluators:
• Simple procedure evaluation
• Effort-based procedure evaluation, which is more complex but is required for integration of
inference with procedure evaluation
• Adaptive evaluation order based procedure evaluation
In the following section we will delve more thoroughly into the interactions between inference
and procedure evaluation.
Another related issue is the modularization of procedures. This issue however is actually
orthogonal to the distinction between the three ProcedureEvaluators mentioned above. Modu-
larity simply requires that particular nodes within a Combo tree be marked as "module roots",
121
EFTA00624268
122
25 Integrative Procedure Evaluation
so that they may be extracted from the Combo tree as a whole and treated as separate modules
(called differently, sub-routines), if the ExecutionManager judges this appropriate.
25.2.1 Simple Procedure Evaluation
The SimpleComboTreeEvaluator simply does Combo tree evaluation as described earlier. When
an Atom is encountered, it looks into the AtomTable to evaluate the object.
In the case that a Schema refers to an ungrounded Schemallode (that is not defined by
a ComboTree as defined in Chapter 19), and an appropriate EvaluationLink value isn't in
the AtomTable, there's an evaluation failure, and the whole procedure evaluation returns the
truth value (.5,0): i.e., a zero-weight-of-evidence truth value, which is equivalent essentially to
returning no value.
In the case that a Predicate refers to an ungrounded PredicateNode, and an appropriate
EvaluationLink isn't in the AtomTable, then some very simple "default thinking" is done, and
it is assigned the truth value of the predicate on the given arguments to be the TruthValue of
the corresponding PredicateNode (which is defined as the mean truth value of the predicate
across all arguments known to CogPrime. )
25.2.2 Effort Based Procedure Evaluation
The next step is to introduce the notion of "effort" the amount of effort that the CogPrime
system must undertake in order to carry out a procedure evaluation. The notion of effort
is encapsulated in Effort objects, which may take various forms. The simplest Effort objects
measure only elapsed processing time; more advanced Effort objects take into consideration
other factors such as memory usage.
An effort-based Combo tree evaluator keeps a running total of the effort used in evaluating
the Combo tree. This is necessary if inference is to be used to evaluate Predicates, Schema,
Arguments, etc. Without some control of effort expenditure, the system could do an arbitrarily
large amount of inference to evaluate a single Atom.
The matter of evaluation effort is nontrivial because in many cases a given node of a Combo
tree may be evaluated in more than one way, with a significant effort differential between the
different methodologies. If a Combo tree Node refers to a predicate or schema that is very
costly to evaluate, then the ProcedureEvaluator managing the evaluation of the Combo tree
must decide whether to evaluate it directly (expensive) or estimate the result using inference
(cheaper but less accurate). This decision depends on how much effort the ProcedureEvaluator
has to play with, and what percentage of this effort it finds judicious to apply to the particular
Combo tree Node in question.
In the relevant prototypes we built within OpenCog, this kind of decision was made based on
some simple heuristics inside ProcedureEvaluator objects. However, it's clear that, in general,
more powerful intelligence must be applied here, so that one needs to have ProcedureEvaluators
that - in cases of sub-procedures that are both important and highly expensive - use PLN
inference to figure out how much effort to assign to a given subproblem.
EFTA00624269
25.3 The Procedure Evaluation Process
123
The simplest useful kind of effort-based Combo tree evaluator is the EffortIntervalCom-
boTreeEvaluator, which utilizes an Effort object that contains three numbers (yes, no, max).
The yes parameter tells it how much effort should be expended to evaluate an Atom if there is a
ready answer in the AtomTable. The no parameter tells it how much effort should be expended
in the case that there is not a ready answer in the AtomTable. The max parameter tells it how
much effort should be expended, at maximum, to evaluate all the Atoms in the Combo tree,
before giving up. Zero effort, in the simplest case, may be heuristically defined as simply looking
into the AtomTable - though in reality this does of course take effort, and a more sophisticated
treatment would incorporate this as a factor as well.
Quantification of amounts of effort is nontrivial, but a simple heuristic guideline is to assign
one unit of effort for each inference step. Thus, for instance,
• (yes, no, max) = (0,5,1000) means that if an Atom can be evaluated by AtomTable lookup,
this is done, but if AtomTable lookup fails, a minimum of 5 inference steps are done to try
to do the evaluation. It also says that no more than 1000 evaluations will be done in the
course of evaluating the Combo tree.
• (yes, no, max) = (3,5,1000) says the same thing, but with the change that even if evaluation
could be done by direct AtomTable lookup, 3 inference steps are tried anyway, to try to
improve the quality of the evaluation.
25.2.3 Procedure Evaluation with Adaptive Evaluation Order
While tracking effort enables the practical use of inference within Combo tree evaluation, if one
has truly complex Combo trees, then a higher degree of intelligence is necessary to guide the
evaluation process appropriately. The order of evaluation of a Combo tree may be determined
adaptively, based on up to three things:
• The history, of evaluation of the Combo tree
• Past history of evaluation of other Combo trees, as stored in a special AtomTable consisting
only of relationships about Combo tree-evaluation-order probabilities
• New information entering into CogPrime's primary AtomTable during the course of evalu-
ation
ProcedureEvaluator objects may be selected at runtime by cognitive schemata, and they may
also utilize schemata and MindAgents internally. The AdaptiveEvaluationOrderComboTreeE-
valuator is more complex than the other ProcedureEvaluators discussed, and will involve var-
ious calls to CogPrime MindAgents, particularly those concerned with PLN inference. WIK-
ISOURCE:ProcedureacecutionDetails
25.3 The Procedure Evaluation Process
Now we give a more thorough treatment of the procedure evaluation proems, as embodied in the
effort-based or adaptive-evaluation-order evaluators discussed above. The process of procedure
evaluation is somewhat complex, because it encompasses three interdependent processes:
EFTA00624270
12d
25 Integrative Procedure Evaluation
• The mechanics of procedure evaluation, which in the CogPrime design involves traversing
Combo trees in an appropriate order. When a Combo tree node referring to a predicate or
schema is encountered during Combo tree traversal, the process of predicate evaluation or
schema execution must be invoked.
• The evaluation of the truth values of predicates - which involves a combination of inference
and (in the case of grounded predicates) procedure evaluation.
• The computation of the truth values of schemata - which may involve inference as well as
procedure evaluation.
We now review each of these processes.
25.3.1 Truth Value Evaluation
What happens when the procedure evaluation process encounters a Combo tree Node that
represents a predicate or compound term? The same thing as when some other CogPrime
process decides it wants to evaluate the truth value of a PredicateNode or CompoundTermNode:
the generic process of predicate evaluation is initiated.
This process is carried out by a TruthValueEvaluator object. There are several varieties of
TruthValueEvaluator, which fall into the following hierarchy:
TruthValueEvaluator
DirectTruthValueEvaluator (abstract)
SimpleDirectTruthValueEvaluator
InferentialTruthValueEvaluator (abstract)
SimplelnferentialTruthValueEvaluator
MixedTruthValueEvaluator
A DirectTruthValueEvaluator evaluates a grounded predicate by directly executing it on
one or more inputs; an InferentialTruthValueEvaluator evaluates via inference based on the
previously recorded, or specifically elicited, behaviors of other related predicates or compound
terms. A MixedTruthValueEvaluator contains references to a DirectTruthValueEvaluator and an
InferentialTruthValueEvaluator, and contains a weight that tells it how to balance the outputs
from the two.
Direct truth value evaluation has two cases. In one case, there is a given argument for the
predicate; then, one simply plugs this argument in to the predicate's internal Combo tree,
and passes the problem off to an appropriately selected ProcedureEvaluator. In the other case,
there is no given argument, and one is looking for the truth value of the predicate in general.
In this latter case, some estimation is required. It is not plausible to evaluate the truth value
of a predicate on every possible argument, so one must sample a bunch of arguments and
then record the resulting probability distribution. A greater or fewer number of samples may
be taken, based on the amount of effort that's been allocated to the evaluation process. It's
also possible to evaluate the truth value of a predicate in a given context (information that's
recorded via embedding in a ContextLink); in this rase the random sampling is restricted to
inputs that lie within the specified context.
On the other hand, the job of an InferentialTruthValueEvaluator is to use inference rather
than direct evaluation to guess the truth value of a predicate (sometimes on a particular argu-
ment, sometimes in general). There are several different control strategies that may be applied
here, depending on the amount of effort allocated. The simplest strategy is to rely on analogy,
EFTA00624271
25.3 The Procedure Evaluation Process
125
simply searching for similar predicates and using their truth values as guidance. (In the case
where a specific argument is given, one searches for similar predicates that have been evaluated
on similar arguments.) If more effort is available, then a more sophisticated strategy may be
taken. Generally, an InferentialTruthValueEvaluator may invoke a Schemallode that embodies
an inference control strategy for guiding the truth value estimation process. These Schemallodes
may then be learned like any others.
Finally, a AlixedTruthValueEvaluator operates by consulting a DirectTruthValueEvaluator
and/or an InferentialTruthValueEvaluator as necesbary, and merging the results. Specifically,
in the case of an ungrounded PredicateNode, it simply returns the output of the Inferential-
TruthValueEvaluator it has chosen. But in the case of a GroundedPredicateNode, it returns a
weighted average of the directly evaluated and inferred values, where the weight is a parameter.
In general, this weighting may be done by a Schemallode that is selected by the MixedTruth-
ValueEvaluator; and these schemata may be adaptively learned.
25.3.2 Schema Execution
Finally, schema execution is handled similarly to truth value evaluation, but it's a bit simpler in
the details. Schemata have their outputs evaluated by SchemaExecutor objects, which in turn
invoke ProcedureEvaluator objects. We have the hierarchy:
SchemaExecutor
DirectSchemaExecutor (abstract)
SimpleDirectSchemaExecutor
InferentialSchemaExecutor (abstract)
SimplelnferentialSchemaExecutor
MixedSchemaExecutor
A DirectSchemaExecutor evaluates the output of a schema by directly executing it on some
inputs; an InferentialSchemaExecutor evaluates via inference based on the previously recorded,
or specifically elicited, behaviors of other related schemata. A MixedSchemaExecutor contains
references to a DirectSchemaExecutor and an InferentialSchemaExecutor, and contains a weight
that tells it how to balance the outputs from the two (not always obvious, depending on the
output type in question).
Contexts may be used in schema execution, but they're used only indirectly, via being
passed to TruthValueEvaluators used for evaluating truth values of PredicateNodes or Com-
poundTermNodes that occur internally in schemata being executed.
EFTA00624272
Section III
Perception and Action
EFTA00624274
Chapter 26
Perceptual and Motor Hierarchies
26.1 Introduction
Having discussed declarative, attentional, intentional and procedural knowledge, we are left only
with sensorimotor and episodic knowledge to complete our treatment of the basic CogPrime
"cognitive cycle" via which a CogPrime system can interact with an environment and seek to
achieve its goals therein.
The cognitive cycle in its most basic form leaves out the most subtle and unique aspects of
CogPrime, which all relate to learning in various forms. But nevertheless it is the foundation
on which CogPrime is built, and within which the various learning processes dealing with the
various forms of memory all interact. The CogPrime cognitive cycle is more complex in many
respects than it would need to be if not for the need to support diverse forms of learning.
And this learning-driven complexity is present to sonic extent in the contents of the present
chapter as well. If learning weren't an issue, perception and actuation could more likely be
treated as wholly (or near-wholly) distinct modules, operating according to algorithms and
structures independent of cognition. But our suspicion is that this sort of approach is unlikely
to be adequate for achieving high levels of perception and action capability under real-world
conditions. Instead, we suspect, it's necessary to create perception and action processes that
operate fairly effectively on their own, but are capable of cooperating with cognition to achieve
yet higher levels of functionality.
And the benefit in such an approach goes both ways. Cognition helps perception and actu-
ation deal with difficult cases, where the broad generalization that is cognition's specialty is
useful for appropriately biasing perception and actuation based on subtle enviromnental regu-
larities. And, the patterns involved in perception and actuation help cognition, via supplying
a rich reservoir of structures and processes to use as analogies for reasoning and learning at
various levels of abstraction. The prominence of visual and other sensory metaphors in abstract
cognition is well known lArn69, Gar00]; and according to Lakoff and Nunez 1LN001 even pure
mathematics is grounded in physical perception and action in very concrete ways.
We begin by discussing the perception and action mechanisms required to interface CogPrime
with an agent operating in a virtual world. We then turn to the more complex mechanisms
needed to effectively interface CogPrime with a robot possessing vaguely humanoid sensors and
actuators, focusing largely on vision processing. This discussion leads up to deeper discussions in
Chapters 27, 28 and 29 where we describe in detail the strategy that would be used to integrate
129
EFTA00624276
130
26 Perceptual and Motor Hierarchies
CogPrime with the DeSTIN framework for AGI perception/action (which was described in some
detail in Chapter 4 of Part 1).
In terms of the integrative cognitive architecture presented in Chapter 5, the material pre-
sented in the chapters in this section has mostly to do with the perceptual and motor hierarchies,
also touching on the pattern recognition and imprinting processes that play a role in the inter-
action between these hierarchies and the conceptual memory. The commitment to a hierarchical
architecture for perception and action Ls not critical for the CogPrime design as a whole - one
could build a CogPrime with non-hierarchical perception and action modules, and the rest of
the system would be about the same. The role of hierarchy here is a reflection of the obvious
hierarchical structure of the everyday human environment, and of the human body. In a world
marked by hierarchical structure, a hierarchically structured perceptual system is advantageous.
To control a body marked by hierarchical structure, an hierarchically structured action system
is advantageous. It would be possible to create a CogPrime system without this sort of in-built
hierarchical structure, and have it gradually self-adapt in such a way as to grow its own internal
hierarchical structure, based its experience in the world. However, this might be a case of push-
ing the "experiential learning" perspective too far. The human brain definitely has hierarchical
structure built into it; it doesn't need to learn to experience the world in hierarchical terms; and
there seems to be no good reason to complicate an AGI's early development phase by forcing
it to learn the basic facts of the world's and its body's hierarchality.
26.2 The Generic Perception Process
We have already discussed the generic action process of CogPrime, in Chapter 25 on procedure
evaluation. Action sequences are generated by Combo programs, which execute primitive ac-
tions, including those corresponding to actuator control signals as well as those corresponding
to, say, mathematical or cognitive operations. In some cases the actuator control signals may
directly dictate movements; in other cases they may supply inputs and/or parameters to other
software (such as DeSTIN, in the integrated CogBot architecture to be described below).
What about the generic perception process? We distinguish sensation from perception, in a
CogPrime context, by defining
• perception as what occurs when some signal from the outside world registers itself in either:
a CogPrime Atom, or some other sort of node (e.g. a DeSTIN node) that is capable of
serving as the target of a CogPrime Link.
• sensation as any "preprocessing" that occurs between the impact of some signal on some
sensor, and the creation of a corresponding perception
Once perceptual Atoms have been created, various perceptual MindAgents comes into play,
taking perceptual schemata (schemata whose arguments are perceptual nodes or relations there-
between) and applying them to Atoms recently created (creating appropriate ExecutionLinks
to store the results). The need to have special, often modality-specific perception MindAgents
to do this, instead of just leaving it to the generic SchemaExecution MindAgent, has to do
with computational efficiency, scheduling and parameter settings. Perception MindAgents are
doing schema execution urgently, and doing it with parameter settings tuned for perceptual
processing. This means that, except in unusual circumstances, newly received stimuli will be
processed immediately by the appropriate perceptual schemata.
EFTA00624277
26.3 Interfacing CogPrime with a Virtual Agent
131
Sonic newly formed perceptual Atoms will have links to existing atoms, ready-made at their
moment of creation. CharacterlnstanceNodes and NumberinstanceNodes are examples; they are
born linked to the appropriate CharacterNodes and NumberNodes. Of course, atoms represent-
ing perceived relationships, perceived groupings, etc., will not have ready-made links and will
have to grow such links via various cognitive processes. Also, the ContextFormation MindAgent
looks at perceptual atom creation events and creates Context Nodes accordingly; and this must
be timed so that the Context Nodes are entered into the system rapidly, so that they can be
used by the processes doing initial-stage link creation for new perceptual Atoms.
In a full CogPrime configuration, newly created perceptual nodes and perceptual schemata
may reside in a special perception-oriented Units, so as to ensure that perceptual processes
occur rapidly, not delayed by slower cognitive processes.
26.2.1 The ExperienceDB
Separate from the ordinary perception process, it may also valuable for there to be a direct
route from the system's sensory sources to a special "ExperienceDB" database that records all
of the system's experience. This does not involve perceptual schemata at all, nor is it left up
to the sensory source; rather, it is carried out by the CogPrime server at the point where it
receives input from the sensory source. This experience database is a record of what the system
has seen in the past, and may be mined by the system in the future for various purposes. The
creation of new perceptual atoms may also be stored in the experience database, but this must
be handled with care as it can pose a large computational expense; it will often be best to store
only a subset of these.
Obviously, such an ExperienceDB is something that has no correlate in the human mind/brain.
This is a case where CogPrime takes advantage of the non-brainlike properties of its digital com-
puter substrate. The CogPrime perception process is intended to work perfectly well without
access to the comprehensive database of experiences potentially stored in the ExperienceDB.
However, a complete record of a mind's experience is a valuable thing, and there seems no
reason for the system not to exploit it fully. Advantages like this allow the CogPrime system to
partially compensate for its lath of some of the strengths of the human brain as an Al platform,
such as massive parallelism.
26.3 Interfacing CogPrime with a Virtual Agent
We now discuss some of the particularities of connecting CogPrime to a virtual world (such as
Second Life, Multiverse, or Unity3D, to name sonic of the virtual world / gaming platforms to
which OpenCog has already been connected in practice).
EFTA00624278
132
26 Perceptual and Motor Hierarchies
26.3.1 Perceiving the Virtual World
The most complex, high-bandwidth sensory data coming in from a typical virtual world is visual
data, so that will be our focus here. We consider three modes in which a virtual world may
present visual data to CogPrime (or any other system):
• Object vision: CogPrime receives information about polygonal objects and their colors,
textures and coordinates (each object is a set of contiguous polygons, and sometimes objects
have "type" information, e.g. cube or sphere)
• Polygon vision: CogPrime receives information about polygons and their colors, textures
and coordinates
• Pixel vision: CogPrime receives information about pixels and their colors and coordinates
In each case, coordinates may be given either in "world coordinates" or in "relative coordinates"
(relative to the gaze). This distinction is not a huge deal since within an architecture like
CogPrime, supplying schemata for coordinate transformation is trivial; and, even if treated as a
machine learning task, this sort of coordinate transformation is not very difficult to learn. Our
current approach is to prefer relative coordinates, as this approach is more natural in terms of
modern Western human psychology; but we note that in some other cultures world coordinates
are preferred and considered more psychologically natural.
Currently we have not yet done any work with pixel vision in virtual worlds. We have
been using object vision for most of our experiments, and consider a combination of polygon
vision and object vision as the "right" approach for early AGI experiments in a virtual worlds
context. The problem with pure object vision is that it removes the possibility for CogPrime
to understand object segmentation. If, for instance, CogPrime perceives a person as a single
object, then how can it recognize a head as a distinct sub-object? Feeding the system a pre-
figured hierarchy of objects, sub-objects and so forth seems inappropriate in the context of
an experiential learning system. On the other hand, the use of polygon vision instead of pixel
vision seems to meet no such objections. This may take different forms in different platforms.
For instance, in our work with a Minecraft-like world in the Unity3D environment, we have
relied heavily on virtual objects made of blocks, in which case the polygons of most interest are
the faces of the blocks.
Momentarily sticking with the object vision case for simplicity, examples of the perceptions
emanating from the virtual world perceptual preprocessor into CogPrime are things like:
1. I am at world-coordinates $W
2. Object with metadata SM is at world-coordinates SW
3. Part of object with metadata $M is at world-coordinates $W
4. Avatar with metadata SM is at world-coordinates SW
5. Avatar with metadata SM is carrying out animation SA
6. Statements in natural language, from the pet owner
The perceptual preprocessor takes these signals and translates them into Atoms, making use
of the special Atomspace mechanisms for efficiently indexing spatial and temporal information
(the and ) as appropriate.
EFTA00624279
26.3 Interfacing CogPrime with a Virtual Agent
133
26.3.1.1 Transforming Real-World Vision into Virtual Vision
One approach to enabling CogPrime to handle visual data coining from the real world is to
transform this data into data of the type CogPrime sees in the virtual world. While this is not
the approach we are taking in our current work, we do consider it a viable strategy, and we
briefly describe it here.
One approach along these lines would involve multiple phases:
• Use a camera eye and a LiDAR (Light Detection And Ranging, used for high-resolution
topographic mapping) sensor in tandem, so as to avoid having to deal with stereo vision
• Using the above two inputs, create a continuous 3D contour map of the perceived visual
world
• Use standard mathematical transforms to polygon-ize the 3D contour map into a large set
of small polygons
• Use heuristics to merge together the small polygons, obtaining a smaller set of larger poly-
gons (but retaining the large set of small polygons for the system to reference in cases where
a high level of detail is necessary)
• Feed the polygons into the perceptual pattern mining subsystem, analogously to the poly-
gons that come in from virtual-world
In this approach, preprocessing is used to make the system see the physical world in a manner
analogous to how it sees the virtual-world world. This is quite different from the DeSTIN-based
approach to CogPrime vision that we will discuss in Chapter 28, but may well also be feasible.
26.3.2 Acting in the Virtual World
Complementing the perceptual preprocessor is the action postprocessor: code that translates
the actions and action-sequences generated by CogPrime into instructions the virtual world can
understand (such as "launch thus-and-thus animation"). Due to the particularities of current
virtual world architectures, the current OpenCogPrime system carries out actions via executing
pre-programmed high-level procedures, such as "move forward one step", "bend over forward"
and so forth. Example action commands are:
1. Move ($D, $S) : $D is a distance, $S is a speed
2. Turn ($A, $S) : $A is an angle, $S is a speed
3. Pitch ($A, $S) : turn vertically up/down... [for birds only]
4. Jump ($D, $H, $S) : $H is a maximum height, at the center of the jump
5. Say ($T), ST is text : for agents with linguistic capability, which is not enabled in the
current version
6. pick up($0) : $0 is an object
7. put down($0)
This is admittedly a crude approach, and if a robot simulator rather than a typical virtual
world were used, it would be possible for CogPrime to emanate detailed servomotor control
commands rather than high-level instructions such as these. However, as noted in Chapter 16
of Part 1, at the moment there is no such thing as a "massive multiplayer robot simulator," and so
the choice is between a multi-participant virtual environment (like the Multiverse environment
EFTA00624280
134
26 Perceptual and Motor Hierarchies
currently used with the PetBrain) or a small-scale robot simulator. Our experiments with virtual
worlds so far have used the high-level approach described here; but we are also experimenting
with using physical robots and corresponding simulators, as will be described below.
26.4 Perceptual Pattern Mining
Next we describe how perceptual pattern mining may be carried out, to recognize meaningful
structures in the stream of data produced via perceiving a virtual or physical world.
In this subsection we discuss the representation of knowledge, and then in the following
subsection we discuss the actual mining. We discuss the process in the context of virtual-world
perception as outlined above, but the same processes apply to robotic perception, whether
one takes the "physical world as virtual world" approach described above or a different sort of
approach such as the DeSTIN hybridization approach described below.
26.4.1 Input Data
First, we may assume that each perception is recorded as set of "transactions", each of which is
of the form
Time, 3D coordinates, object type
or
Time, 3D coordinates, action type
Each transaction may also come with an additional list of (attribute, value) pairs, where the
list of attributes is dependent upon the object or action type. Transactions are represented as
Atoms, and don't need to be a specific Atom type - but are referred to here by the special name
transactions simply to make the discussion clear.
Next, define a transaction template as a transaction with location and time information set
to wild cards - and potentially, some other attributes set to wild cards. (These are implemented
in terms of Atoms involving VariableNodes.)
For instance, some transaction templates in the current virtual-world might be informally
represented as:
• Reward
• Red cube
• kick
• move_forward
• Cube
• Cube, size 5
• inc
• Teacher
EFTA00624281
26.4 Perceptual Pattern Mining
135
26.4.2 Transaction Graphs
Next we may conceive a transaction graph, whose nodes are transactions and whose links are
labeled with labels like after, SimAND, SSegAND (short for SimultaneousSequentialAND),
near. in_front_of, and so forth (and whose links are weighted as well).
We may also conceive a transaction template graph, whose nodes are transaction templates,
and whose links are the same as in the transaction graph. Examples of transaction template
graphs are
near(Cube, Teacher)
SSegAND(move_forward, Reward)
Where Cube, Teacher, etc are transaction templates since Time and 3D coordinates are left
unspecified.
And finally, we may conceive a transaction template relationship graph (1TRG), whose nodes
may be any of: transactions; transaction templates; basic spatiotemporal predicates evaluated
at tuples of transactions or transaction templates. For instance
SimAND(near(Cube, Teacher), above(Cube, Chair))
26.4.3 Spatiotempcnut Conjunctions
Define a temporal conjunction as a conjunction involving SimultaneousAND and Sequentia-
IAND operators (including SSNAND as a special case of SeciAND: the special case that interests
us in the short term). The conjunction is therefore ordered, e.g.
A SSecIAND S SimAND C SSemAND D
We may assume that the order of operations favors SimAND, so that no parenthesizing is
necessary.
Next, define a basic spatiotemporal conjunction as a temporal conjunction that conjoins
terms that are either
• transactions, or
• transaction templates. or
• basic spat iotemporal predicates applied to tuples of transactions or transaction templates
I.e. a basic spatiotemporal conjunction is a temporal conjunction of nodes from the transaction
template relationship graph.
An example would be:
(hold ball) SimAND ( near (ne, teacher) ) SSeciAND Reward
This assumes that the hold action has an attribute that is the type of object held, so that
hold ball
in the above temporal conjunction is a shorthand for the transaction template specified by
action type: hold
object_held_type: ball
This example says that if the agent is holding the ball and is near the teacher then shortly
after that, the agent will get a reward.
EFTA00624282
136
26 Perceptual and Motor Hierarchies
26.4.4 The Mining Task
The perceptual mining task, then, is to find basic spatiotemporal conjunctions that are inter-
esting. What constitutes interestingness is multifactorial, and includes.
• involves important Atoms (e.g. Reward)
• has a high temporal cohesion (i.e. the strength of the time relationships embodied in the
SimAND and SeqAND links is high)
• has a high spatial cohesion (i.e. the near() relationships have high strength)
• has a high frequency
• has a high surprise value (its frequency is far from what would be predicted by its component
sub-conjunctions)
Note that a conjunction can be interesting without satisfying all these criteria; e.g. if it involves
something important and has a high temporal cohesion, we want to find it regardless of its
spatial cohesion.
In preliminary experiments we have worked with a provisional definition orinterestingness"
as the combination of frequency and temporal cohesion, but this must be extended; and one
may even wish to have the combination function optimized over time (slowly) where the fitness
function is defined in terms of the STI and LTI of the concepts generated.
26.4.4.1 A Mining Approach
One tractable approach to perceptual pattern mining is greedy and iterative, involving the
following steps:
1. Build an initial transaction template graph G
2. Greedily mine some interesting basic spatiotemporal conjunctions from it, adding each
interesting conjunction found as a new node in G (so that G becomes a transaction template
relationship graph), repeating step 2 until boredom results or time runs out
The same TTRG may be maintained over time, but of course will require a robust forgetting
mechanism once the history gets long or the environment gets nontrivially complex.
The greedy mining step may involve simply grabbing SeqAND or SimAND links with prob-
ability determined by the (importance and/or interestingness) of their targets, and the prob-
abilistic strength and temporal strength of the temporal AND relationship, and then creating
conjunctions based on these links (which then become new nodes in the TTRG, so they can be
built up into larger conjunctions).
26.5 The Perceptual-Motor Hierarchy
The perceptual pattern mining approach described above is "flat," in the sense that it simply
proposes to recognize patterns in a stream of perceptions, without imposing any kind of ex-
plicitly hierarchical structure on the pattern recognition process or the memory of perceptual
patterns. This is different from how the human visual system works, with its clear hierarchical
EFTA00624283
26.6 Object Recognition from Polygonal Meshes
137
structure, and also different from many contemporary vision architectures, such as DeSTIN or
Hawkins' Numenta system which also utilizes hierarchical neural networks.
However, the approach described above may be easily made hierarchical within the CogPrime
architecture, and this is likely the most effective way to deal with complex visual scenes. Most
simply, in this approach, a hierarchy may be constructed corresponding to different spatial
regions, within the visual field. The RegionNodes at the lowest level of the hierarchy correspond
to small spatial regions, the ones at the next level up correspond to slightly larger spatial regions,
and so forth. Each RegionNode also correspond to a certain interval of time, and there may
be different RegionNodes corresponding to the same spatial region but with different time-
durations attached to them. RegionNodes may correspond to overlapping rather than disjoint
regions.
Within each region mapped by a RegionNode, then, perceptual pattern mining as defined
in the previous section may occur. The patterns recognized in a region are linked to the corre-
sponding RegionNode - and are then fed as inputs to the RegionNodes corresponding to larger,
encompassing regions; and as suggestions-to-guide-pattern-recognition to nearby RegionNodes
on the same level. This architecture involves the fundamental hierarchical structure/dynamic
observed in the human visual cortex. Thus, the hierarchy incurs a dynamic of patterns-within-
patterns-within-patterns, and the heterarchy incurs a dynamic of patterns-spawning-similar-
patterns.
Also, patterns found in a RegionNode should be used to bias the pattern-search in the
RegionNodes corresponding to smaller. contained regions: for instance, if many of the sub-
regions corresponding to a certain region have revealed parts of a face, then the pattern-mining
processes in the remaining sub-regions may be instructed to look for other face-parts.
This architecture permits the hierarchical dynamics utilized in standard hierarchical vision
models, such as Jeff Hawkins' and other neural net models, but within the context of CogPrime's
pattern-mining approach to perception. It is a good example of the flexibility intrinsic to the
CogPrime architecture.
Finally, why have we called it a perceptual-motor hierarchy above? This is because, due to
the embedding of the perceptual hierarchy in CogPrime's general Atom-network, the percepts in
a certain region will automatically be linked to actions occurring in that region. So, there may
be some perception-cognition-action interplay specific to a region, occurring in parallel with
the dynamics in the hierarchy of multiple regions. Clearly this mirrors some of the complex
dynamics occurring in the human brain, and is also reflected in the structure of sophisticated
perceptual-motor approaches like DeSTIN, to be discussed below.
26.6 Object Recognition from Polygonal Meshes
Next we describe a more specific perceptual pattern recognition algorithm - a strategy for iden-
tifying objects in a visual scene that is perceived as a set of polygons. It is not a thoroughly
detailed algorithmic approach, but rather a high-level description of how this may be done effec-
tively within the CogPrime design. It is offered here largely as an illustration of how specialized
perceptual data processing algorithms may he designed and implemented within the CogPrime
framework.
EFTA00624284
138
26 Perceptual and Motor Hierarchies
We deal here with an agent whose perception of the world, at any point in time, is understood
to consist of a set of polygons, each one described in terms of a list of corners. The corners may
be assumed to be described in coordinates relative to the viewing eye of the agent.
What we mean by "identifying objects" here is something very simple. We don't mean iden-
tifying that a particular object is a chair, or is Ben's brown chair, or anything like that - we
simply mean identifying that a given collection of polygons is meaningfully grouped into an ob-
ject. That is the task considered here. The object could be a single block, it could be a person,
or it could be a tower of blocks (which appears as a single object until it is taken apart).
Of course, not all approaches to polygon-based vision processing would require this sort of
phase: it would be possible, as an alternative, to simply compare the set of polygons in the
visual field to a database of prior experience and then do object identification (in the present
sense) based on this database-comparison. But in the approach described in this section, one
begins instead with an automated segmentation of the set of perceived polygons into a set of
objects.
26.6.1 Algorithm Overview
The algorithm described here falls into three stages:
1. Recognizing PersistentPolygonNodes (PPNodes) from PolygonNodes.
2. Creating Adjacency Graphs from PPNodes.
3. Clustering in the Adjacency Graph.
Each of these stages involves a bunch of details, not all of which have been fully resolved: this
section just gives a conceptual overview.
We will speak in terms of objects such as PolygonNode, PPNode and so forth, because inside
the CogPrime AI engine, observed and conceived entities are represented as nodes in an graph.
However, this terminology is not very important here, and what we call a PolygonNode here
could just as well be represented in a host of other ways, within the overall CogPrime framework.
26.6.2 Recognizing PersistentPolygonNodes (PPNodes) from
PolygonNodes
A PolygonNode represents a polygon observed at a point in time. A PPNode represents a series
of PolygonNodes that are heuristically guessed to represent the same PolygonNode at different
moments in time.
Before "object permanence" is learned, the heuristics for recognizing PPNodes will only
work in the case of a persistent polygon that, over an interval of time, is experiencing relative
motion within the visual field, but is never leaving the visual field. For example some reasonable
heuristics are: If P1 occurs at time t, P2 occurs at time s where s is very close to t, and PI are
similar in shape, size and color and position, then P1 and P2 should be grouped together into
the same PPNode.
EFTA00624285
26.6 Object Recognition from Polygonal Meshes
139
More advanced heuristics would deal carefully with the case where some of these similarities
did not hold, which would allow us to deal e.g. with the case where an object was rapidly
changing color.
In the case where the polygons are coming from a simulation world like OpenSim, then from
our positions as programmers and world-masters, we can see that what a PPNode is supposed to
correspond to is a certain side of a certain OpenSim object; but it doesn't appear immediately
that way to CogPrime when controlling an agent in OpenSim since CogPrime isn't perceiving
OpenSim objects, it's perceiving polygons. On the other hand, in the case where polygons are
coming from software that postprocesses the output of a LiDAR based vision system, then the
piecing together of PPNodes from PolygonNodes is really necessary.
26.6.3 Creating Adjacency Graphs from. PPNodes
Having identified PPNodes, we may then draw a graph between PPNodes, a PPGraph (also
called an "Adjacency Graph"), wherein the links are AdjacencyLinks (with weights indicating
the degree to which the two PPNodes tend to be adjacent, over time). A more refined graph
might also involve SpatialCoordinationLinks (with weights indicating the degree to which the
vector between the centroids of the two PPNodes tends to be consistent over time).
We may then use this graph to do object identification:
• First-level objects may be defined as clusters in the graph of PPNodes.
• One may also make a graph between first-level objects, an ObjectGraph with the same
kinds of links as in the PPGraph. Second-level objects may be defined as clusters in the
ObjectGraph.
The "strength" of an identified object may be assigned as the "quality" of the cluster (measured
in terms of how tight the cluster is, and how well separated from other clusters.)
As an example, consider a robot with two parts: a body and a head. The whole body may
have a moderate strength as a first-level object, but the head and body individually will have
significantly greater strengths as first-level objects. On the other hand, the whole body should
have a pretty strong strength as a second-level object.
It seems convenient (though not necessary) to have a PhysicalObjectNode type to represent
the objects recognized via clustering; but the first versus second level object distinction should
not need to be made on the Atom type level.
Building the adjacency graph requires a mathematical formula defining what it means for
two PPNodes to be adjacent. Creating this formula may require a little tuning. For instance,
the adjacency between two PPNodes PP1 and PP2 may be defined as the average over time
of the adjacency of the PolygonNodes PP1(t) and PP2(t) observed at each time t. (A p'th
power average' may be used here, and different values of p may be tried.) Then, the adjacency
between two (simultaneous) PolygonNodes P1 and P2 may be defined as the average over all
x in PI of the minimum over all y in P2 of sim(x,y), where sim(,) is an appropriately scaled
similarity function. This latter average could arguably be made a maximum; or perhaps even
better a p'th power average with large p, which approximates a maximum.
I the p'th power average is defined as J
XP
EFTA00624286
140
26 Perceptual and Motor Hierarchies
26.6.4 Clustering in the Adjacency Graph.
As noted above, the idea is that objects correspond to clusters in the adjacency graph. This
means we need to implement some hierarchical clustering algorithm that is tailored to find
clusters in symmetric weighted graphs. Probably some decent algorithms of this character exist,
if not it would be fairly easy to define one, e.g. by mapping some standard hierarchical clustering
algorithm to deal with graphs rather than vectors.
Clusters will then be mapped into PhysicalObjectNodes, interlinked appropriately via Phys-
icalPartLinks and AdjacencyLinks. (E.g. there would be a PhysicalPartLink between the Phys-
icalObjectNode representing a head and the PhysicalObjectNode representing a body [where
the body is considered as including the head).
26.6.5 Discussion
It seems probable that, for simple scenes consisting of a small number of simple objects, clus-
tering for object recognition will be fairly unproblematic. However, there are two cases that are
potentially tricky:
• Sub-objects: e.g. the head and torso of a body, which may move separately; or the nose of
the head, which may wiggle; or the legs of a walking dog; etc.
• Coordinated objects: e.g. if a character's hat is on a table, and then later on his head, then
when it's on his head we basically want to consider him and his hat as the same object, for
some purposes.
These examples show that partitioning a scene into objects is a borderline-cognitive rather than
purely lower-level-perceptual task, which cannot be hard-wired in any very simple way.
We also note that, for complex scen
, clustering may not work perfectly for object recogni-
tion and some reasoning may be needed to aid with the process. Intuitively, these may corre-
spond to scenes that, in human perceptual psychology, require conscious attention and focus in
order to be accurately and usefully perceived.
26.7 Interfacing the Atomspace with a Deep Learning Based
Perception-Action Hierarchy
We have discussed how one may do perception processing such as object recognition within the
Atomspace, and this is indeed a viable strategy. But an alternate approach is also interesting,
and likely more valuable in the case of robotic perception/action: build a separate perceptual-
motor hierarchy, and link it in with the Atomspace. This approach is appealing in large part
because a lot of valuable and successful work has already been done using neural networks
and related architectures for perception and actuation. And it is not necessarily contradictory
to doing perception processing in the Atomspace - obviously, one may have complementary,
synergetic perception processing occurring in two different parts of the architecture.
EFTA00624287
26.7 Interfacing the Atomspace with a Deep Learning Based Perception-Action Hierarchy
141
This section reviews some general ideas regarding the interfacing of CogPrime with deep
learning hierarchies for perception and action; the following chapter then discusses one example
of this in detail, involving the DeSTIN deep learning architecture.
26.7.1 Hierarchical Perception Action Networks
CogPrime could be integrated with a variety of different hierarchical perception/action archi-
tectures. For the purpose of this section, however, we will consider a class of architectures that
is neither completely general nor extremely specific. Many of the ideas to be presented here are
in fact more broadly applicable beyond the architecture described here.
The following assumptions will be made about the HPANs (Hierarchical Perception/Action
Network) to be hybridized with CogPrime. It may be best to use multiple HPANs, at least one
for declarative/sensory/episodic knowledge (we'll call this the "primary HPAN") and one for
procedural knowledge. A HPAN for intentional knowledge (a goal hierarchy; in DeSTIN called
the "critic hierarchy") may be valuable as well. We assume that each HPAN has the properties:
1. It consists of a network of nodes, endowed with a learning algorithm, whose connectivity
pattern is largely but not entirely hierarchical (and whose hierarchy contains both feedback,
feedforward and lateral connections)
2. It contains a set of input nodes, receiving perceptual inputs, at the bottom of the hierarchy
3. It has a set of output nodes, which may span multiple levels of the hierarchy. The "output
nodes" indicate informational signals to cognitive processes lying outside the HPAN, or else
control signals to actuators, which may be internal or external.
4. Other nodes besides I/O nodes may potentially be observed or influenced by external pro-
cesses; for instance they may receive stimulation
5. Link weights in the HPAN get updated via some learning algorithm that is roughly speaking
"statistically Hebbian." in the sense that on the whole when a set of nodes get activated
together for a period of time, they will tend to become attractors. By an attractor we mean:
a set S of nodes such that the activation of a subset of S during a brief interval tends to
lead to the activation of the whole set S during a reasonably brief interval to follow
6. As an approximate but not necessarily strict rule, nodes higher in the hierarchy tend to
be involved in attractors corresponding to events or objects localized in larger spacetime
regions
Examples of specific hierarchical architectures broadly satisfying these requirements are the
visual pattern recognition networks constructed by Hawkins 11113061 and IPCP011, and Arel's
DeSTIN system discussed earlier (and in more depth in following chapters). The latter appears
to fit the requirements particularly snugly due to having dynamics very well suited to the
formation of a complex array of attractors, and a richer methodology for producing outputs.
These are all not only HPANs but have a more particular structure that in Chapter 27 is called
a Compositional Spatiotemporal Deep Learning Network or CSDLN.
The particulars of the use of HPANs with OpenCog are perhaps best explained via enumer-
ation of memory types and control operations.
EFTA00624288
142
26 Perceptual and Motor Hierarchies
26.7.2 Declarative Memory
The key idea here is linkage of primary HPAN attractors to CogPrime
ConceptNodes via MemberLinks. This is in accordance with the notion of glocal memory, in
the language of which the HPAN attractors are the maps and the corresponding ConceptNodes
are the keys. Put simply, when a HPAN attractor is recognized, MemberLinks are created
between the HPAN nodes comprising the main body of the attractor, and a ConceptNode in
the AtomTable representing the attractor. MemberLink weights may be used to denote fuzzy
attractor membership. Activation may spread from HPAN nodes to ConceptNodes, and STI
may spread from ConceptNodes to HPAN nodes; a conversion rate between HPAN activation
and STI currency must be maintained by the CogPrime central bank (see Chapter 23), for
ECAN purposes.
Both abstract and concrete knowledge may be represented in this way. For instance, the
Eiffel Tower would correspond to one attractor, the general shape of the Eiffel Tower would
correspond to another, and the general notion of a "tower" would correspond to yet another. As
these three examples are increasingly abstract, the corresponding attractors would be weighted
increasingly heavily on the upper levels of the hierarchy.
26.7.3 Sensory Memory
CogPrime may also use its primary HPAN to store memories of sense-perceptions and low-level
abstractions therefrom. MemberLinks may join concepts in the AtomTable to percept-attractors
in the HPAN. If the HPAN is engineered to associate specific neural modules to specific spatial
regions or specific temporal intervals, then this may be accounted for by automatically index-
ing ConceptNodes corresponding to attractors. centered in those modules in the AtomTable's
TimeServer and SpaceServer objects, which index Atoms according to time and space.
An attractor representing something specific like the Eiffel Tower, or Bob's face, would be
weighted largely in the lower levels of the hierarchy, and would correspond mainly to sensory
rather than conceptual memory.
26.7.4 Procedural Memory
The procedural HPAN may be used to learn procedures such as low-level motion primitives
that are more easily learned using HPAN training than using more abstract procedure learning
methods. For example, a Combo tree learned by MOSES in CogPrime might contain a primitive
corresponding to the predicate-argument relationship pick_up(ball); but the actual procedure
for controlling a robot hand to pick up a ball, might be expressed as an activity pattern within
the low-level procedural HPAN. A procedure P stored in the low-level procedural HPAN would
be represented in the AtomTable as a ConceptNode C linked to key nodes in the HPAN attractor
corresponding to P. The invocation of P would be accomplished by transferring STI currency
to C and then allowing ECAN to do its work.
On the other hand, CogPrime's interfacing of the high-level procedural HPAN with the Cog-
Prime ProcedureRepository is intimately dependent on the particulars of the MOSES proce-
EFTA00624289
26.7 Interfacing the Atomspace with a Deep Learning Based Perception-Action Hierarchy
143
dure learning algorithm. As will be outlined in more depth in Chapter 33, MOSES is a complex,
multi-stage process that tries to find a program maximizing some specified fitness function, and
that involves doing the following within each "deme" (a deme being an island of roughly-similar
programs)
1. casting program trees into a hierarchical normal form
2. evaluating the program trees on a fitness function
3. building a model distinguishing fit versus unfit program trees, which involves: 3a. figuring
out what program tree features the model should include; 3b. building the model using a
learning algorithm
4. generating new program trees that are inferred likely to give high fitness, based on the
model
5. return to step 1 with these new program trees
There is also a system for managing the creation and deletion of denies.
The weakest point in CogPrime's current MOSES-based approach to procedure learning
appears to be step 3. And the main weakness is conceptual rather than algorithmic; what is
needed is to replace the current step 3 with something that uses long-term memory to do
model-building and feature-selection, rather than (like the current code) doing these things
in a manner that's restricted to the population of program trees being evolved to optimize a
particular fitness function.
One promising approach to resolving this issue is via replacing step 3b (and, to a limited
extent, 3a) with an interconnection between MOSES and a procedural HPAN. A HPAN can
do supervised categorization, and can be designed to handle feature selection in a manner
integrated with categorization, and also to integrate long-term memory into its categorization
decisions.
26.7.5 Episodic Memory
In a hybrid CogPrime /HPAN architecture, episodic knowledge may be handled via a combi-
nation of:
1. using a traditional approach to store a large ExperienceDB of actual experienced episodes
(including sensory inputs and actions; and also the states of the most important items in
memory during the experience)
2. using the Atomspace (with its TimeServer and Spar,Server components) to store declarative
knowledge about experiences
3. using dimensional embedding to index the AtomSpace's episodic knowledge in a spatiotem-
porally savvy way, as described in Chapter 40
4. training a large HPAN to summarize the scope of experienced episodes (this could be the
primary HPAN used for declarative and sensory memory, or could potentially be a separate
episodic HPAN)
Such a network should be capable of generating imagined episodes based on cues, as well
recalling real episodes. The HPAN would serve as a sort of index into the memory, of episodes,
There would be HebbianLinks from the AtomTable into the episodic HPAN.
EFTA00624290
144
26 Perceptual and Motor Hierarchies
For instance, suppose that once the agent built an extremely tall tower of blocks, taller than
any others in its memory. Perhaps it wants to build another very tall tower again, so it wants to
summon up the memory, of that previous occasion, to see if there is possibly guidance therein.
It then proceeds by thinking about tallness and towerness at the same time, which stimulates
the relevant episode, because at the time of building the extremely tall tower, the agent was
thinking a lot about tallness (so thoughts of tallness are part of the episodic memory).
26.7.6 Action Selection and Attention Allocation
CogPrime's action selection mechanism chooses procedures based on which ones are estimated
most likely to achieve current goals given current context, and places these in an "active proce-
dure pool" where an ExecutionManager object mediates their execution.
Attention allocation spans all components of CogPrime, including an HPAN if one is in-
tegrated. Attention flows between the two components due to the conversion of STI to and
from HPAN activation. And in this manner assignment of credit flows from GoalNodes into the
HPAN, because this kind of simultaneous activation may be viewed as "rewarding" a HPAN
link. So, the HPAN may reward signals from GoalNodes via ECAN, because when a ConceptN-
ode gets rewarded, if the ConceptNode points to a set of nodes, these nodes get some of the
reward.
26.8 Multiple Interaction Channels
Now we discuss a broader issue regarding the interfacing between CogPrime and the external
world. The only currently existing embodied OpenCog applications, PetBrain and C,ogBot, are
based on a loosely human model of perception and action, in which a single CogPrime instance
controls a single mobile body, but this of course is not the only way to do things. More generally,
what we can say is that a variety of external-world events come into a CogPrime system from
physical or virtual world sensors, plus from other sources such as database interfaces, Web
spiders, and/or other sources. The external systems providing CogPrime with data may be
generically referred to as sensory sources (and in the terminology we adopt here, once Atoms
have been created to represent external data, then one is dealing with perceptions rather than
sensations). The question arises how to architect a CogPrime system, in general, for dealing
with a variety of sensory sources.
We introduce the notion of an "interaction channel": a collection of sensory, sources that
is intended to be considered as a whole as a synchronous stream, and that is also able to
receive CogPrime actions - in the sense that when CogPrime carries out actions relative to
the interaction channel, this directly affects the perceptions that CogPrime receives from the
interaction channel. A CogPrime meant to have conversations with 10 separate users at once
might have 10 interaction channels. A human mind has only one interaction channel in this
sense (although humans may become moderately adept at processing information from multiple
external-world sources, coming in through the same interaction channel).
Multiple-interaction-channel digital psychology may become extremely complex - and hard
for us, with our single interaction channels, to comprehend. This is one among many cases
EFTA00624291
26.8 Multiple Interaction Channels
145
where a digital mind, with its more flexible architecture, will have a clear advantage over our
human minds with their fixed and limited neural architectures. For simplicity, however, in the
following chapters we will often focus on the single-interaction-channel case.
Events coming in through an interaction channel are presented to the system as new per-
ceptual Atoms, and relationships amongst these. In the multiple interaction channel case, the
AttentionValues of these newly created Atoms require special treatment. Not only do they re-
quire special rules, they require additional fields to be added to the AttentionValue object,
beyond what has been discussed so far.
We require newly created perceptual Atoms to be given a high initial STI. And we also
require them to be given a high amount of a quantity called "interaction-channel STI." To
support this, the AttentionValue objects of Atoms must be expanded to contain interaction-
channel STI values; and the ImportanceUpdating MindAgent must compute interaction-channel
importance separately from ordinary importance.
And, just as we have channel-specific AttentionValues, we may also have channel-specific
TruthValues. This allows the system to separately account for the frequency of a given percep-
tual item in a given interaction channel. However, no specific mechanism is needed for these,
they are merely contextual truth values, to be interpreted within a Context Node associated
with the interaction channel.
EFTA00624292
Chapter 27
Integrating CogPrime with a Compositional
Spatiotemporal Deep Learning Network
27.1 Introduction
Many different approaches to "low-level" perception and action processing are possible within
the overall CogPrime framework. We discussed several in the previous chapter, all elaborations
of the general hierarchical pattern recognition approach. Here we describe one sophisticated ap-
proach to hierarchical pattern recognition based perception in more detail: the tight integration
of CogPrime with a sophisticated hierarchical perception/action oriented learning system such
as the DeSTIN architecture reviewed in Chapter 4 of Part 1.
We introduce here the term "Compositional Spatiotemporal Deep Learning Network" (CS-
DLN), to refer to deep learning networks whose hierarchical structure directly mirrors the
hierarchical structure of spacetime. In the language of Chapter 26, a CSDLN is a special kind
of HPAN (hierarchical perception action network), which has the special property that each
of its nodes refers to a certain spatiotemporal region and is concerned with predicting what
happens inside that region. Current exemplifications of the CSDLN paradigm include the DeS-
TIN architecture that we will focus on here. along with Jeff Hawkins' Numenta "HTM" system
111B061 I, Itatnar Arel's DeSTIN 1ARC09a1, Itamar Arel's HDRN 2 system (the proprietary,
closed-source sibling of DeSTIN), Dileep George's spin-off from Numenta a, and work by Mo-
hamad Tarifi 1TSH Ill, Bundzel and Hashimoto 11311101, and others. CSDLNs are reasonably
well proven as an approach to intelligent sensory data processing, and have also been hypothe-
sized as a broader foundation for artificial general intelligence at the human level and beyond
I1113061 IABC094
While CSDLNs have been discussed largely in the context of perception, the specific form
of CSDLN we will pursue here goes beyond perception processing, and involves the coupling
of three separate hierarchies, for perception, action and goals/reinforcement 1G141G+101. The
"action" CSDLNs discussed here correspond to the procedural HPAN discussed in Chapter
26. Abstract learning and self-understanding are then hypothesized as related to systems of
attractors emerging from the close dynamic coupling of the upper levels of the three hierarchies.
I While the Numenta system is the best-known CSDLN architecture, other CSDLNs appear more impressively
functional in various respects; and many CSDLN-related ideas existed in the literature well before Numenta's
advent.
2 http: \binatix.con
3 http : \ v icar ioussyst ems c om
147
EFTA00624294
27 Integrating CogPrime with a Compositional Spatiotemporal Deep Learning Network
DeSTIN is our paradigm case of this sort of CSDLN, but most of the considerations given here
would apply to any CSDLN of this general character.
CSDLNs embody a certain conceptual model of the nature of intelligence, and to integrate
them appropriately with a broader architecture, one must perform the integration not only
on the level of software code but also on the level of conceptual models. Here we focus here
on the problem of integrating an extended version of the DeSTIN CSDLN system with the
CogPrime integrative AGI (artificial general intelligence) system. The crux of the issue here is
how to map DeSTIN's attractors into CogPrime's more abstract, probabilistic "weighted, la-
beled hypergraph" representation (called the Atomspace). The main conclusion reached is that
in order to perform this mapping in a conceptually satisfactory way, one requires a system of
hierarchies involving the structure of DeSTIN's network but the semantic structures
of the Atomspace. The DeSTIN perceptual hierarchy is augmented by motor and goal hier-
archies, leading to a tripartite "extended DeSTIN". In this spirit, three "semantic-perceptual"
hierarchies are proposed, corresponding to the three extended-DeSTIN CSDLN hierarchies and
explicitly constituting an intermediate level of representation between attractors in DeSTIN
and the habitual cognitive usage of CogPrime Atoms and Atom-networks. For simple reference
we refer to this as the "Semantic CSDLN" approach.
A "tripartite semantic CSDLN" consisting of interlinked semantic perceptual, motoric and
goal hierarchies could be coupled with DeSTIN or another CSDLN architecture to form a novel
AGI approach; or (our main focus here) it may be used as a glue between an CSDLN and and
a more abstract semantic network such as the cognitive Atoms in CogPrime's Atomspace.
One of the core intuitions underlying this integration is that, in order to achieve the desired
level of functionality for tasks like picture interpretation and assembly of complex block struc-
tures, a convenient route is to perform a fairly tight integration of a highly capable CSDLN like
DeSTIN with other CogPrime components. For instance, we believe it's necessary to go deeper
than just using DeSTIN as an input/output layer for CogPrime, by building associative links
between the nodes inside DeSTIN and those inside the Atomspace.
This "tightly linked integration" approach is obviously an instantiation of the general cogni-
tive synergy principle, which hypothesizes particular properties that the interactions between
components in an integrated AGI system should display, in order for the overall system to dis-
play significant general intelligence using limited computational resources. Simply piping output
from an CSDLN to other components, and issuing control signals from these components to the
CSDLN, is likely an inadequate mode of integration, incapable of leveraging the full potential
of CSDLNs; what we are suggesting here is a much tighter and more synergetic integration.
In terms of the general principle of mind-world correspondence, the conceptual justification
for CSDLN/CogPrime integration would be that the everyday human world contains many com-
positional spatiotemporal structures relevant to human goals, but also contains many relevant
patterns that are not most conveniently cast into a compositional spatiotemporal hierarchy.
Thus, in order to most effectively perceive, remember, represent, manipulate and enact the full
variety of relevant patterns in the world, it is sensible to have a cognitive structure containing
a CSDLN as a significant component, but not the only component.
EFTA00624295
27.3 Semantic CSDLN for Perception Processing
149
27.2 Integrating CSDLNs with Other AI Frameworks
CSDLNs represent knowledge as attractor patterns spanning multiple levels of hierarchical net-
works, supported by nonlinear dynamics and (at least in the case of the overall DeSTIN design)
involving cooperative activity of perceptual, motor and control networks. These attractors are
learned and adapted via a combination of methods including localized pattern recognition al-
gorithms and probabilistic inference. Other AGI paradigms represent and learn knowledge in a
host of other ways. How then can CSDLNs be integrated with these other paradigms?
A very simple form of integration, obviously, would be to use a CSDLN as a sensorimotor
cortex for another AI system that's focused on more abstract cognition. In this approach, the
CSDLN would stream state-vectors to the abstract cognitive system, and the abstract cognitive
system would stream abstract cognitive inputs to the CSDLN (which would then consider them
together with its other inputs). One thing missing in this approach is the possibility of the
abstract cognitive system's insights biasing the judgments inside the CSDLN. Also, abstract
cognition systems aren't usually well prepared to handle a stream of quantitative state vectors
(even ones representing intelligent compressions of raw data).
An alternate approach is to build a richer intermediate layer, which in effect translates
between the internal language of the CSDLN and the internal language of the other AI system
involved. The particulars, and the viability, of this will depend on the particulars of the other
AI system. What we'll consider here is the case where the other AI system contains explicit
symbolic representations of patterns (including patterns abstracted from observations that may
have no relation to its prior knowledge or any linguistic terms). In this case, we suggest, a
viable approach may be to construct a "semantic CSDLN" to serve as an intermediary. The
semantic CSDLN has the same hierarchical structure as an CSDLN, but inside each node it
contains abstract patterns rather than numerical vectors. This approach has several potential
major advantages: the other Al system is not presented with a large volume of numerical vectors
(which it may be unprepared to deal with effectively); the CSDLN can be guided by the other AI
system, without needing to understand symbolic control signals; and the intermediary semantic
CSDLN can serve as a sort of "blackboard" which the CSDLN and the other AI system can
update in parallel, and be guided by in parallel, thus providing a platform encouraging "cognitive
synergy".
The following sections go into more detail on the concept of semantic CSDLNs. The discussion
mainly concerns the specific context of DeSTIN/CogPrime integration, but the core ideas would
apply to the integration of any CSDLN architecture with any other AI architecture involving
uncertain symbolic representations susceptible to online learning.
27.3 Semantic CSDLN for Perception Processing
In the standard perceptual CSDLN hierarchy, a node N on level k (considering level 1 as the
bottom) corresponds to a spatiotemporal region S with size sk (sk increasing monotonically and
usually exponentially with k); and, has children on level k -1 corresponding to spatiotemporal
regions that collectively partition S. For example, a node on level 3 might correspond to a 16x16
pixel region $ of 2D space over a time period of 10 seconds, and might have 4 level 2 children
corresponding to disjoint 4x4 regions of 2D space over 10 seconds, collectively composing S.
EFTA00624296
150
27 Integrating CogPrime with a Compositional Spatiotemporal Deep Learning Network
This kind of hierarchy is very, effective for recognizing certain types of visual patterns. How-
ever it is cumbersome for recognizing some other types of patterns, e.g. the pattern that a face
typically contains two eyes beside each other, but at variable distance from each other.
One way to remedy this deficiency is to extend the definition of the hierarchy, so that nodes do
not refer to fixed spatial or temporal positions, but only to relative positions. In this approach,
the internals of a node are basically the same as in an CSDLN, and the correspondence of the
nodes on level k with regions of size s k is retained, but the relationships between the nodes are
quite different. For instance, a variable-position node of this sort could contain several possible
2D pictures of an eye, but be nonspecific about where the eye is located in the 2D input image.
Figure 27.1 depicts this "semantic-perceptual CSDLN" idea heuristically, showing part of a
semantic-perceptual CSDLN indicating the parts of a face, and also the connections between
the semantic-perceptual CSDLN, a standard perceptual CSDLN, and a higher-level cognitive
semantic network like CogPrime's Atomspace.
More formally, in the suggested "semantic-perceptual CSDLN" approach, a node N on level
k, instead of pointing to a set of level k. — 1 children, points to a small (but not necessarily
connected) semantic network , such that the nodes of the semantic network are (variable-
position) level k — 1 nodes; and the edges of the semantic network possess labels repre-
senting spatial or temporal relationships, for example horizontally_ aligned, vertically_ aligned,
right_side, left_side, above, behind, immediately_ right, immediately_left, immediately_ above,
immediately_ below, after, immediately_ after. The edges may also be weighted either with num-
bers or probability distributions, indicating the quantitative weight of the relationship indicated
by the label.
So for example, a level 3 node could have a child network of the form horizontally_aligned(Ni, N2)
where N1 and N2 are variable-position level 2 nodes. This would mean that N1 and N2 are along
the same horizontal axis in the 2D input but don't need to be immediately next to each other.
Or one could say, e.g. on_axis_perpendicular_to(N1, N2, N3, N1), meaning that N1 and N2
are on an axis perpendicular to the axis between N3 and N4. It may be that the latter sort of
relationship is fundamentally better in some cases, because horizontally_aligned is still tied to
a specific orientation in an absolute space, whereas on _axis _perpendicular _to is fully relative.
But it may be that both sorts of relationship are useful.
Next, development of learning algorithms for semantic CSDLNs seems a tractable research
area. First of all, it would seem that, for instance, the DeSTIN learning algorithms could
straightforwardly be utilized in the semantic CSDLN case, once the local semantic networks
involved in the network are known. So at least for sonic CSDLN designs, the problem of learning
the semantic networks may be decoupled somewhat from the learning occurring inside the nodes.
DeSTIN nodes deal with clustering of their inputs, and calculation of probabilities based on
these clusters (and based on the parent node states). The difference between the semantic
CSDLN and the traditional DeSTIN CSDLN has to do with what the inputs are.
Regarding learning the local semantic networks, one relatively straightforward approach
would be to data mine them from a standard CSDLN. That is, if one runs a standard CS-
The perceptual CSDLN shown is unrealistically small for complex vision processing (only 4 layers), and only
a fragment of the semantic-perceptual CSDLN is shown (a node corresponding to the category face, and then
a child network containing nodes corresponding to several components of a typical face). In a real semantic-
perceptual CSDLN, there would be many other nodes on the same level as the face node. many other parts
to the face subnetwork besides the eyes, nose and mouth depicted here; the eye. nose and mouth nodes would
also have child subnetworks; there would be link from each semantic node to centrokla within a large number of
perceptual nodes; and there would also be many nodes not corresponding clearly to any single English language
concept like eye, nose, face, etc.
EFTA00624297
27.3 Semantic CSDLN for Perception Processing
151
.-/
PERCEPTUAL HTM
4.. I...mi.
Weil Wines. —
11..10000 MIN
vomilecHYM invade
peed•••••••••vd.
inPROMPI
COGNITNE
SEMANTIC
NETWORK
SEMANTIC-PERCEPTUAL NTH
Fig. 27.1: Simplified depiction of the relationship between a semantic-perceptual CSDLN, a tra-
ditional perceptual CSDLN (like DeSTIN), and a cognitive semantic network (like CogPrime's
AtomSpace).
DLN on a stream of inputs, one can then run a frequent pattern mining algorithm to find
semantic networks (using a given vocabulary of semantic relationships) that occur frequently
in the CSDLN as it processes input. A subnetwork that is identified via this sort of mining,
can then be grouped together in the semantic CSDLN, and a parent node can be created and
pointed to it.
Also, the standard CSDLN can be searched for frequent patterns involving the clusters
(referring to DeSTIN here, where the nodes contain clusters of input sequences) inside the
nodes in the semantic CSDLN. Thus, in the "semantic DeSTIN" case, we have a feedback
interaction wherein: 1) the standard CSDLN is formed via processing input;
2) frequent pattern mining on the standard CSDLN is used to create subnetworks and cor-
responding parent nodes in the semantic CSDLN;
3) the newly created nodes in the semantic CSDLN get their internal clusters updated via
standard DeSTIN dynamics;
4) the clusters in the semantic nodes are used as seeds for frequent pattern mining on the
standard CSDLN, returning us to Step 2 above.
After the semantic CSDLN is formed via mining the perceptual CSDLN, it may be used to
bias the further processing of the perceptual CSDLN. For instance, in DeSTIN each node carries
out probabilistic calculations involving knowledge of the prior probability of the "observation"
coming into that node over a given interval of time. In the current DeSTIN version, this prior
probability is drawn from a uniform distribution, but it would be more effective to draw the
prior probability from the semantic network - observations matching things represented in the
semantic network would get a higher prior probability. One could also use subtler strategies
EFTA00624298
152
27 Integrating CogPrime with a Compositional Spatiotemporal Deep Learning Network
such as using imprecise probabilities in DeSTIN IG °el 1 14, and assigning a greater confidence
to probabilities involving observations contained in the semantic network.
Finally, we note that the nodes and networks in the semantic CSDLN may either
• be linked into the nodes and links in a semantic network such as CogPrime's AtomSpace
• actually be implemented in terms of an abstract semantic network language like CogPrime's
AtomSpace (the strategy to be suggested in Chapter 29).
This allows us to think of the semantic CSDLN as a kind of bridge between the standard CSDLN
and the cognitive layer of an AI system. In an advanced implementation, the cognitive network
may be used to suggest new relationships between nodes in the semantic CSDLN, based on
knowledge gained via inference or language.
27.4 Semantic CSDLN for Motor and Sensorimotor Processing
Next we consider a semantic CSDLN that focuses on movement rather than sensation. In this
case, rather than a 2D or 3D visual space, one is dealing with an n-dimensional configura-
tion space (C-space). This space has one dimension for each degree of freedom of the agent
in question. The more joints with more freedom of movement an agent has, the higher the
dimensionality of its configuration space.
Using the notion of configuration space, one can construct a semantic-motoric CSDLN hi-
erarchy analogous to the semantic-perceptual CSDLN hierarchy. However, the curse of dimen-
sionality demands a thoughtful approach here. A square of side 2 can be tiled with 4 squares of
side 1, but a 50-dimensional cube of side 2 can be tiled with 25° 50-dimensional cubes of side
1. If one is to build a CSDLN hierarchy in configuration space analogous to that in perceptual
space, some sort of sparse hierarchy is necessary.
There are many ways to build a sparse hierarchy of this nature, but one simple approach is
to build a hierarchy where the nodes on level k represent motions that combine the motions
represented by nodes on level k — 1. In this case the most natural semantic label predicates
would seem to be things like simultaneously, after, immediately_after, etc. So a level k node
represents a sort of "motion plan" corresponded by chaining together (serially and/or in parallel)
the motions encoded in level k-1 nodes. Overlapping regions of C-space correspond to different
complex movements that share some of the same component movements, e.g. if one is trying
to slap one pennon while elbowing another, or run while kicking a soccer ball forwards. Also
note, the semantic CSDLN approach reveals perception and motor control to have essentially
similar hierarchical structures, more so than with the traditional CSDLN approach and its
fixed-position perceptual nodes.
Just as the semantic-perceptual CSDLN is naturally aligned with a traditional perceptual
CSDLN, similarly a semantic-motoric CSDLN may be naturally aligned with a "motor CS-
DLN". A typical motoric hierarchy in robotics might contain a node corresponding to a robot
arm, with children corresponding to the hand, upper arm and lower arm; the hand node might
then contain child nodes corresponding to each finger, etc. This sort of hierarchy is intrinsically
spatiotemporal because each individual action of each joint of an actuator like an arm is in-
trinsically bounded in space and time. Perhaps the most ambitious attempt along these lines
is IA\101j, which shows how perceptual and motoric hierarchies are constructed and aligned in
an architecture for intelligent automated vehicle control.
EFTA00624299
27.4 Semantic CSDLN for Motor and Sensorimotor Processing
153
Figure 27.2 gives a simplified illustration of the potential alignment between a semantic-
motoric CSDLN and a purely motoric hierarchy (like the one posited above in the context of
extended DeSTIN). 5 In the figure, the motoric hierarchy is assumed to operate somewhat like
DeSTIN, with nodes corresponding to (at the lowest level) individual servomotors, and (on
higher levels) natural groupings of servomotors. The node corresponding to a set of servos is
assumed to contain centroids of clusters of trajectories through configuration space. The task
of choosing an appropriate action is then executed by finding the appropriate centroids for
the nodes. Note an asymmetry between perception and action here. In perception the basic
flow is bottom-up, with top-down flow used for modulation and for "imaginative" generation of
percepts. In action, the basic flow is top-down, with bottom-up flow used for modulation and for
imaginative, "fiddling around" style generation of actions. The semantic-motoric hierarchy then
contains abstractions of the C-space centroids from the motoric hierarchy - i.e., actions that
bind together different C-space trajectories that correspond to the same fundamental action
carried out in different contexts or under different constraints. Similarly to in the perceptual
case, the semantic hierarchy here serves as a glue between lower-level function and higher-level
cognitive semantics.
iVsaffi
l
cir
IC v.
oan.tos.
dl
.
&
4.
met
roam
nolemotimw
own,
gar'
, t-TLI L
•
• •
% I
MOTORIC HTM
The mown[ hatarth/ neck teerespendolg 0), a
parocubet to 0 wnretan.
marM h.e Iona
man
chums Sash. through configuration spate that the
ganconettes ham Natant* Mama&
get object
COGNITIVE
SEMANTIC
NETWORK
or lower arm toward object
SEMANTIC- MOTORIC HTM
Fig. 27.2: Simplified depiction of the relationship between a semantic-motoric CSDLN, a motor
control hierarchy (illustrated by the hierarchy of servos associated with a robot arm), and a
cognitive semantic network (like CogPrime's AtomSpace).
5 In the figure, only a fragment of the semantic-motoric CSDLN is shown (a node corresponding to the "get
object" action category, and then a child network containing nodes corresponding to several components of the
action). In a real semantic-motoric CSDLN, there would be many other nodes on the same level as the get-object
node, many other parts to the get-object subnetwork besides the ones depicted here; the subnetwork nodes would
also have child subnetworks; there would be link from each semantic node to centroids within a large number
of motoric nodes: and there might also be many nodes not corresponding clearly to any single English language
concept like 'grasp object" etc.
EFTA00624300
154
27 Integrating CogPrime with a Compositional Spatiotemporal Deep Learning Network
27.5 Connecting the Perceptual and Motoric Hierarchies with a Goal
Hierarchy
One way to connect perceptual and motoric CSDLN hierarchies is using a "semantic-goal CS-
DLN" bridging the semantic-perceptual and semantic-motoric CSDLNs. The semantic-goal CS-
DLN would be a "semantic CSDLN" loosely analogous to the perceptual and motor semantic
CSDLNs - and could optionally be linked into the reinforcement hierarchy of a tripartite CS-
DLN like extended DeSTIN. Each node in the semantic-goal CSDLN would contain implications
of the form "Context & Procedure —> Goal", where Goal is one of the Al system's overall goals
or a subgoal thereof, and Context and Procedure refer to nodes in the perceptual and motoric
semantic CSDLNs respectively.
For instance, a semantic-goal CSDLN node might contain an implication of the form "I
perceive my hand is near object X & I grasp object X r I possess object X." This would be
useful if "I possess object X" were a subgoal of some higher-level system goal, e.g. if X were a
food object and the system had the higher-level goal of obtaining food.
To the extent that the system's goals can be decomposed into hierarchies of progressively
more and more spatiotemporally localized subgoals, this sort of hierarchy will make sense,
leading to a tripartite hierarchy as loosely depicted in Figure 27.3. 6 One could attempt to
construct an overall AGI approach based on a tripartite hierarchy of this nature, counting on
the upper levels of the three hierarchies to conic together dynamically to form an integrated
cognitive network, yielding abstract phenomena like language, self, reasoning and mathematics.
On the other hand, one may view this sort of hierarchy as a portion of a larger integrative AGI
architecture, containing a separate cognitive network, with a less rigidly hierarchical structure
and less of a tie to the spatiotemporal structure of physical reality. The latter view is the one
we are primarily taking within the CogPrime AGI approach, viewing perceptual, motoric and
goal hierarchies as "lower level" subsystems connected to a "higher level" system based on the
CogPrime AtomSpace and centered on its abstract cognitive processes.
Learning of the subgoals and implications in the goal hierarchy is of course a complex matter,
which may be addressed via a variety of algorithms, including online clustering (for subgoals or
implications) or supervised learning (for implications, the "supervision" being purely internal
and provided by goal or subgoal achievement).
6 The diagram is simplified in many ways, e.g. only a handful of nodes in each hierarchy is shown (rather than
the whole hierarchy), and lines without arrows are used to indicate bidirectional arrows, and nearly all links are
omitted. The purpose is just to show the general character of interaction between the components in a simplified
context.
EFTA00624301
27.5 Connecting the Perceptual and Motoric Hierarchies with a Coal Hierarchy
155
SEMANTIC-PERCEPTUAL HTM
SB1ANTIC GOAL HTM
SBMNfIC• MOTORIC HTM
Fig. 27.3: Simplified illustration of the proposed interoperation of perceptual, motoric and goal
semantic CSDLNs.
EFTA00624302
Chapter 28
Making DeSTIN Representationally Transparent
Co-authored with Itamar Arel
28.1 Introduction
In this chapter and the next we describe one particular incarnation of the above ideas on
semantic CSDLNs in more depth: the integration of CogPrime with the DeSTIN architecture
reviewed in Chapter 4 of Part 1.
One of the core intuitions underlying this integration is that, in order to achieve the de-
sired level of functionality for tasks like picture interpretation and assembly of complex block
structures, it will be necessary to integrate DeSTIN (or some similar system) and CogPrime
components fairly tightly - going deeper than just using DeSTIN as an input/output layer for
CogPrime, by building a number of explicit linkages between the nodes inside DeSTIN and
CogPrime respectively.
The general DeSTIN design has been described in talks as comprising three crosslinked hier-
archies. handling perception, action and reinforcement; but so far only the perceptual hierarchy
(also called the "spatiotemporal inference network") has been implemented or described in de-
tail in publications. In this chapter we will focus on DeSTIN's perception hierarchy. We will
explain DeSTIN's perceptual dynamics and representations as we understand them, more thor-
oughly than was done in the brief review above; and we will describe a series of changes to the
DeSTIN design, made in the spirit of easing DeSTIN/OpenCog integration. In the following
chapter we will draw action and reinforcement into the picture, deviating somewhat in the de-
tails from the manner in which these things would be incorporated into a standalone DeSTIN,
but pursuing the same concepts in an OpenCog integration context.
What we describe here is a way to make a "Uniform DeSTIN", in which the internal repre-
sentation of perceived visual forms is independent of affine transformations (translation, scaling,
rotation and shear). This "representational transparency" means that, when Uniform DeSTIN
perceives a pattern: no matter how that pattern is shifted or linearly transformed, the way Uni-
form DeSTIN represents that pattern internally is going to be basically the same. This makes
it easy to look at a collection of DeSTIN states, obtained by exposing a DeSTIN perception
network to the world at different points in time, and see the commonalities in what they are
perceiving and how they are interpreting it. By contrast, in the original version of DeSTIN
(here called "classic DeSTIN"), it may take significant effort to connect the internal repre-
sentation of a visual pattern and the representation of its translated or linearly transformed
versions. The uniformity of Uniform DeSTIN makes it easier for humans to inspect DeSTIN's
state and understand what's going on, and also (more to the point) makes it easier for other
157
EFTA00624304
158
28 Making DeSTIN Representationally Transparent
AI components to recognize patterns in sets of DeSTIN states. The latter fact is critical for the
DeSTIN/OpenCog integration
28.2 Review of DeSTIN Architecture and Dynamics
The hierarchical architecture of DeSTIN's spatiotemporal inference network comprises an ar-
rangement into multiple layers of "nodes" comprising multiple instantiations of an identical
processing unit. Each node corresponds to a particular spatiotemporal region, and uses a sta-
tistical learning algorithm to characterize the sequences of patterns that are presented to it by
nodes in the layer beneath it.
More specifically, at the very lowest layer of the hierarchy nodes receive as input raw data
(e.g. pixels of an image) and continuously construct a belief state that attempts to characterize
the sequences of patterns viewed. The second layer, and all these above it, receive as input
the belief states of nodes at their corresponding lower layers, and attempt to construct belief
states that capture regularities in their inputs. Each node also receives as input the belief state
of the node above it in the hierarchy (which constitutes "contextual" information, utilized in
the node's prediction process).
Inside each node, an online clustering algorithm is used to identify regularities in the se-
quences received by that node. The centroids of the clusters learned are stored in the node and
comprise the basic visual patterns recognized by that node. The node's "belief" regarding what
it is seeing, is then understood as a probability density function defined over the centroids at
that node. The equations underlying this centroid formation and belief updating process are
identical for every node in the architecture, and were given in their original form in IARC09aJ,
though the current open-source DeSTIN codebase reflects some significant improvements not
yet reflected in the publication record.
In short, the way DeSTIN represents an item of knowledge is as a probability distribution
over "network activity patterns" in its hierarchical network. An activity pattern, at each point
in time, comprises an indication of which centroids in each node are most active, meaning they
have been identified as most closely resembling what that node has perceived, as judged in the
context of the perceptions of the other nodes in the system. Based on this methodology, the
DeSTIN perceptual network serves the critical role of building and maintaining a model of the
state of the world as visually perceived.
This methodology allows for powerful unsupervised classification. If shown a variety of real-
world scenes, DeSTIN will automatically form internal structures corresponding to the various
natural categories of objects shown in the scenes, such as trees, chairs, people, etc.; and also
to the various natural categories of events it sees, such as reaching, pointing, falling. In order
to demonstrate the informativeness of these internal structures, experiments have been done
using DeSTIN's states as input feature vectors for supervised learning algorithms, enabling
high-accuracy supervised learning of classification models from labeled image data [KARR)]. A
closely related algorithm developed by the same principal researcher (Itamar Arel) has proven
extremely successful at audition tasks such as phoneme recognition IA13S ± Ill.
EFTA00624305
28.3 Uniform DeSTIN
159
28.2.1 Beyond Gray-Scale Vision
The DeSTIN approach may easily be extended to other senses beyond gray-scale vision. For
color vision, it suffices to replace the one-dimensional signals coming into DeSTIN's lower layer
with 3D signals representing points in the color spectrum; the rest of the DeSTIN process may
be carried over essentially without modification. Extension to further senses is also relatively
straightforward on the mathematical and software structure level, though they may of course
require significant additional tuning and refinement of details.
For instance, olfaction does not lend itself well to hierarchical modeling, but audition and
haptics (touch) do:
• for auditory perception, one could use a DeSTIN architecture in which each layer is one-
dimensional rather than two-dimensional, representing a certain pitch. Or one could use two
dimensions for pitch and volume. This results in a system quite similar to the DeSTIN-like
system shown to perform outstanding phoneme recognition in IA13S+ lib and is conceptually
similar to Hierarchical Hidden Markov Models (HHMMs), which have proven quite success-
ful in speech recognition and which Ray Kurzweil has argued are the central mechanism
of human intelligence tKurl2l. Note also recent results published by Microsoft Research,
showing dramatic improvements over prior speech recognition results based on use of a
broadly HHMM-like deep learning system
2I.
• for haptic perception, one could use a DeSTIN architecture in which the lower layer of the
network possesses a 2D topology reflecting the topology of the surface of the body. Similar
to the somatccensory cortex in the human brain, the map could be distorted so that more
"pixels" are used for regions of the body from which more data is available (e.g. currently
this might be the fingertips, if these were implemented using Syntouch technology [Fill,
which has proved excellent at touch-based object identification). Input could potentially
be multidimensional if multiple kinds of haptic sensors were available, e.g temperature,
pressure and movement as in the Syntouch case.
Augmentation of DeSTIN to handle action as well as perception is also possible, and will be
discussed in Chapter 29
28.3 Uniform DeSTIN
It would be possible to integrate DeSTIN in its original form with OpenCog or other Al sys-
tems with symbolic aspects, via using an unsupervised machine learning algorithm to recognize
patterns in sets of states of the DeSTIN network as originally defined. However, this pattern
recognition task becomes much easier if one suitably modifies DeSTIN, so as to make the com-
monalities between semantically similar states more obviously perceptible. This can be done by
making the library of patterns recognized within each DeSTIN node invariant with respect to
translation, scale, rotation and shear - a modification we call "Uniform DeSTIN." This "uni-
formization" decreases DeSTIN's degree of biological mimicry, but eases integration of DeSTIN
with symbolic AI methods.
EFTA00624306
160
28 Making DeSTIN Representationally Transparent
28.3.1 Translation-Invariant DeSTIN
The first revision to the "classic DeSTIN" to be suggested here is: All the nodes on the same
level of the DeSTIN hierarchy should share the same library, of patterns. In the context of classic
DeSTIN (i.e. in the absence of further changes to DeSTIN to be suggested below, which extend
the type of patterns usable by DeSTIN), this means: the nodes on the same level should share
the same list of centroids. This makes DeSTIN's pattern recognition capability translation-
invariant. This translation invariance can be achieved without any change to the algorithms for
updating centroids and matching inputs to centroids.
In this approach, it's computationally feasible to have a much larger library of patterns
utilized by each node, as compared to in classic DeSTIN. Suppose we have anxn pixel grid,
where the lowest level has nodes corresponding to 4 x 4 squares. Then, there are a? nodes
on the lowest level, and on the k'th level there are (fE)2 nodes. This means that, without
increasing computational complexity (actually decreasing it, under reasonable assumptions), in
translation-invariant Uniform DeSTIN we can have a factor of (402 more centroids on level k.
One can achieve a much greater decrease in computational complexity (with the same amount
of centroid increase) via use of a clever data structure like a cover tree IBK1,1181 to store the
centroids at each level. Then the nearest-neighbor matching of input patterns to the library
(centroid) patterns would be very rapid, much faster than linearly comparing the input to each
pattern in the list.
28.3.1.1 Conceptual Justification for Uniform DeSTIN
Generally speaking, one may say that: if the class of images that the system will see is invariant
with respect to linear translations, then without loss of generality, we can assume that the
library of patterns at each node on the same level is the same.
In reality this assumption isn't quite going to hold. For instance, for an eye attached to a
person or humanoid robot, the top of the pixel grid will probably look at a person's hair more
often than the bottom ... because the person stands right-side-up more often than they stand
upside-down, and because they will often fixate the center of their view on a person's face, etc.
For this reason, we can recognize our friend's face better if we're looking at them directly, with
their face centered in our vision.
However, we suggest that this kind of peculiarity is not really essential to vision processing
for general intelligence. There's no reason you can't have an intelligent vision system that
recognizes a face just as well whether it's centered in the visual field or not. (In fact you could
straightforwardly explicitly introduce this kind of bias within a translation-invariant DeSTIN,
but it's not clear this is a useful direction.)
By and large, in almost all cases, it seems to us that in a DeSTIN system exposed to a wide
variety of real-world inputs in complex situations, the library of patterns in the different nodes
at the same level would turn out to be substantially the same. Even if they weren't exactly
the same, they would be close to the same, embodying essentially the same regularities. But
of course, this sameness would be obscured, because centroid 7 in a certain node X on level 4
might actually be the same as centroid 18 in some other node Y on level 4 ... and there would
be no way to tell that centroid 7 in node X and centroid 18 and node Y were actually referring
to the same pattern, without doing a lot of work.
EFTA00624307
28.3 Uniform DeSTIN
161
28.3.1.2 Comments on Biological Realism
Translation-invariant DeSTIN deviates further from human brain structure than classic DeS-
TIN, but this is for good reason.
The brain has a lot of neurons, since adding new neurons was fairly easy and cheap for
evolution; and tends to do things in a massively parallel manner, with great redundancy. For
the brain, it's not so problematically expensive to have the functional equivalent of a lot of
DeSTIN nodes on the same level, all simultaneously using and learning libraries of patterns
that are essentially identical to each other. Using current computer technology, on the other
hand, this sort of strategy is rather inefficient.
In the brain, messaging between separated regions is expensive, whereas replicating function
redundantly is cheap. In most current computers (with some partial exceptions such as CPUs),
messaging between separated regions is fairly cheap (so long as those regions are stored on
the same machine), whereas replicating function redundantly is expensive. Thus, even in cases
where the same concept and abstract mathematical algorithm can be effectively applied in
both the brain and a computer, the specifics needed for efficient implementation may be quite
different.
28.3.2 Mapping States of Translation-Invariant DeSTIN into the
Atomspace
Mapping classic DeSTIN's states into a symbolic pattern-manipulation engine like OpenCog is
possible, but relatively cumbersome. Doing the same thing with Uniform DeSTIN is much more
straightforward.
In Uniform DeSTIN, for example, Cluster 7 means the same thing in ANY node on level 4.
So after a Uniform DeSTIN system has seen a fair number of images, you can be pretty sure its
library of patterns is going to be relatively stable. Some clusters may come and go as learning
progresses, but there's going to be a large and solid library, of clusters at each level that persists,
because all of its member clusters occur reasonably often across a variety of inputs.
Define a DeSTIN state-tree as a (quaternary) tree with one node for each DeSTIN node;
and living at each node, a small list of (integer pattern_code, float weight) pairs. That is, at
each node, the state-tree has a short-list of the patterns that closely match a given state at that
node. The weights may be assumed between 0 and 1. The integer pattern codes have the same
meaning for every, node on the same level.
As you feed DeSTIN inputs, at each point in time it will have a certain state, representable as
a state-tree. So, suppose you have a large database of DeSTIN state-trees, obtained by showing
various inputs to DeSTIN over a long period of time. Then, you can do various kinds of pattern
recognition on this database of state-trees.
More formally, define a state-subtree as a (quaternary) tree with a single integer at each node.
Two state-subtrees may have various relationships with each other within a single state-tree -
for instance they may be adjacent to each other, or one may appear atop or below the other, etc.
In these terms, one interesting kind of pattern recognition to do is: Recognize frequent state-
subtrees in the stored library of state-trees; and then recognize frequent relationships between
these frequent state-subtrees. The latter relationships will form a kind of "image grammar,"
conceptually similar and formally related to those described in IZNIN. Further, temporal pat-
EFTA00624308
162
28 Making DeSTIN Representationally Transparent
terns may be recognized in the same way as spatial ones, as part of the state-subtree grammar
(e.g. state•subtree A often occurs right before state-subtree B; state-subtree C often occurs
right before and right below state-subtree D; etc.).
The flow of activation from OpenCog back down to DeSTIN is also fairly straightforward in
the context of translation-invariant DeSTIN. If relationships have been stored between concepts
in OpenCogPrime's memory and grammatical patterns between state-subtrees, then whenever
concept C becomes important in OpenCogPrime's memory, this can cause a top-down increase
in the probability of matching inputs to DeSTIN node centroids, that would cause the DeSTIN
state-tree to contain the grammatical patterns corresponding to concept C.
28.3.3 Scale-Invariant DeSTIN
The next step, moving beyond translation invariance, is to make DeSTIN's pattern recognition
mostly (not wholly) scale invariant. We will describe a straightforward way to map centroids
on one level of DeSTIN, into centroids on the other levels of DeSTIN. This means that when a
centroid has been learned on one level, it can be experimentally ported to all the other levels,
to see if it may be useful there too.
To make the explanation of this mapping clear, we reiterate some DeSTIN basics in slightly
different language:
• A centroid on Level N is: a spatial arrangement (e.g. k x k square lattice) of beliefs of Level
N —1. (More generally it is a spatiotemporal arrangement of such beliefs, but we will ignore
this for the moment.)
• A belief on Level N is: a probability distribution over centroids on Level N. For heuristic
purposes one can think about this as a mixture of Gaussian, though this won't always be
the best model.
• Thus, a belief on Level N is: a probability distribution over spatial (or more generally,
spatiotemporal) arrangements of beliefs on Level N — 1
On Level 1, the role of centroids is played by simple k x k squares of pixels. Level 1 beliefs
are probability distributions over these small pixel squares. Level 2 centroids are hence spa-
tial arrangements of probability distributions over small pixel-squares; and Level 2 beliefs are
probability distributions over spatial arrangements of probability distributions over small pixel-
squares.
A small pixel-square S may be mapped into a single pixel P via a heuristic algorithm such
as:
• if S has more black than white pixels, then P is black
• is S has more white than black pixels, then P is white
• if S has an equal number of white and black pixels, then use some heuristic. For instance if
S is 4 x 4 you could look at the central 2 x 2 square and assign P to the color that occurs
most often there. If that is also a tie, then you can just arbitrarily assign P to the color
that occurs in the upper left corner of S.
A probability distribution over small pixel-squares may then be mapped into a probability
distribution over pixel values (B or Mr). A probability distribution over the two values B and
Wmay be approximatively mapped into a single pixel value - the one that occurs most often
EFTA00624309
28.3 Uniform DeSTIN
163
in the distribution, with a random choice made to break a tie. This tells us how to map Level 2
beliefs into spatial arrangements of pixels; and thus, it tells us how to map Level 2 beliefs into
Level 1 beliefs.
But this tells us how to map Level N beliefs into Level N-1 beliefs, inductively. Remember,
a Level N belief is a probability distribution (pdf for short) over spatial arrangements of beliefs
on Level N — I. For example: A Level 3 belief if a pdf over arrangements of Level 2 beliefs. But
since we can map Level 2 beliefs into Level 1 beliefs, this means we can map a Level 3 belief
into a pdf over arrangements of Level 1 beliefs - which means we can map a Level 3 belief into
a Level 2 belief. Etc.
Of course, this also tells as how to map Level N centroids into Level N — 1 centroids. A
Level N centroid is a pdf over arrangements of Level N —1 beliefs; a Level N —1 centroid is a
pdf over arrangements of Level N — 2 beliefs. But Level N —1 beliefs can be mapped into Level
N — 2 beliefs. so Level N centroids can be represented as pdfs over arrangements of Level N
beliefs, and hence mapped into Level N — 1 centroids.
In practice, one can implement this idea by moving from the bottom up. Given the mapping
from Level 1 "centroids" to pixels, one can iterate through the Level 1 beliefs and identify
which pixels they correspond to. Then one can iterate through the Level 2 beliefs and identify
which Level 1 beliefs they correspond to. Etc. Each Level N belief can be explicitly linked
to a corresponding level N — 1 belief. Synchronously, as one moves up the hierarchy, Level N
centroids can be explicitly linked to corresponding Level N — 1 centroids.
Since there are in principle more possible Level Nbeliefs than Level N-1 beliefs, the mapping
from level Nbeliefs to level N-1 beliefs is many-to-one. This is a reason not to simply maintain
a single centroid pool across levels. However, when a new centroid C is added to the Level N
pool, it can be mapped into a Level N — 1 centroid to be added to the Level N — 1 pool (if
not there already). And, it can also be used to spawn a Level N + 1 centroid, drawn randomly
from the set of possible Level N +1 centroids that map into C.
Also, note that it is possible to maintain a single centroid numbering system across levels, so
that a reference like "centroid # 175" has only one meaning in an entire DeSTIN network, even
though some of these centroid may only be meaningful above a certain level in the network.
28.3.4 Rotation Invariant DeSTIN
With a little more work, one can make DeSTIN rotation and shear invariant as well 1. Consid-
ering rotation first:
• When comparing an input A to a Level N node with a Level N centroid B , consider var-
ious rotations of A, and see which rotation gives the closest match.
• When you match a centroid to an input observation-or-belief, record the rotation angle
corresponding to the match.
The second of these points implies the tweaked definitions
• A centroid on Level N is: a spatial arrangement (e.g. k x k square lattice) of beliefs of Level
N —1
• A belief on Level N is: a probability distribution over (angle, centroid) pairs on Level N.
I The basic idea in this section, in the context of rotation, is due to Jade O'Neill (private communication)
EFTA00624310
164
28 Making DeSTIN Representationally Transparent
From these it follows that a belief on Level N is: a probability distribution over (angle, spatial
arrangement of beliefs) pairs on Level N — 1
An additional complexity here is that two different (angle, centroid) pairs (on the same level)
could be (exactly or approximately) equal to each other. This necessitates an additional step of
"centroid simplification", in which ongoing checks are made to see if there are any two centroids
C1, C2 on the same level so that: There exist angles A1, A2 so that (Ai, CI) is very close to
(A2, C2). In this case the two centroids may be merged into one.
To apply these same ideas to shear, one may simply replace "rotation angle" in the above
by "(rotation angle, shear factor) pair."
28.3.5 Temporal Perception
Translation and scale invariant DeSTIN can be applied perfectly well if the inputs to DeSTIN,
at level 1, are movies rather than static images. Then, in the simplest version, Level 1 consists
of pixel cubes instead of pixel squares, etc. (the third dimension in the cube representing time).
The scale invariance achieved by the methods described above would then be scale invariance
in time as well as in space.
In this context, one may enable rectangular shapes as well as cubes. That is, one can look
at a Level N centroid consisting of m time-slices of a k x k arrangement of Level N —1 beliefs
- without requiring that in = k .... This would make the centroid learning algorithm a little
more complex, because at each level one would want to consider centroids with various values
of in, from in = 1, ..., k (and potentially m > k also).
28.4 Interpretation of DeSTIN's Activity
Uniform DeSTIN constitutes a substantial change in how DeSTIN does its business of recog-
nizing patterns in the world - conceptually as well as technically. To explicate the meaning of
these changes, we briefly present our favored interpretation of DeSTIN's dynamics.
The centroids in the DeSTIN library represent points in "spatial pattern space", i.e. they
represent exemplary spatial patterns. DeSTIN's beliefs, as probability distributions over cen-
troids, represent guesses as to which of the exemplary spatial patterns are the best models of
what's currently being seen in a certain space-time region.
This matching between observations and centroids might seem to be a simple matter of
"nearest neighbor matching"; but the subtle point is, it's not immediately obvious how to
best measure the distance between observations and centroids. The optimal way of measuring
distance is going to depend on context; that is to say, on the actual distribution of observations
in the system's real environment over time.
DeSTIN's algorithm for calculating the belief at a node, based on the observation and cen-
troids at that node plus the beliefs at other nearby nodes, is essentially a way of tweaking the
distance measurement between observations and centroids, so that this measurement accounts
for the context (the historical distribution of observations). There are many possible ways of
doing this tweaking. Ideally one could use probability theory explicitly, but that's not always
EFTA00624311
28.4 Interpretation of DeSTIN's Activity
165
going to be computationally feasible, so heuristics may be valuable, and various versions of
DeSTIN have contained various heuristics in this regard.
The various ways of "uniformizing" DeSTIN described above (i.e. making its pattern recogni-
tion activity approximately invariant with respect to affine transformations), don't really affect
this story - they just improve the algorithm's ability to learn based on small amounts of data
(and its rapidity at learning from data in general), by removing the need for the system to
repeatedly re-learn transformed versions of the same patterns. So the uniformization just lets
DeSTIN carry out its basic activity faster and using less data.
28.4.1 DeSTIN's Assumption of Hierarchical Decomposability
Roughly speaking, DeSTIN will work well to the extent that: The average distance between each
part of an actually observed spatial pattern, and the closest centroid pattern, is not too large
(note: the choice of distance measure in this statement is potentially subtle). That is: DeSTIN's
set of centroids is supposed to provide a compact model of the probability distribution of spatial
patterns appearing in the experience of the cognitive system of which DeSTIN is a part.
DeSTIN's effective functionality relies on the assumption that this probability distribution
is hierarchically decomposable - i.e. that the distribution of spatial patterns appearing over a
k x k region can be compactly expressed, to a reasonable degree of approximation, as a spatial
combination of the distributions of spatial patterns appearing over (k/4) x (k/4) regions. This
assumption of hierarchical decomposability greatly simplifies the search problem that DeSTIN
faces, but also restricts DeSTIN's capability to deal with more general spatial patterns that are
not easily hierarchically decomposable. However, the benefits of this approach seem to outweigh
the costs, given that visual patterns in the environments humans naturally encounter do seem
(intuitively at least) to have this hierarchical property.
28.4.2 Distance and Utility
Above we noted that choice of distance measure involved in the assessment of DeSTIN's effec-
tive functionality is subtle. Further above, we observed that the function of DeSTIN's belief
assessment is basically to figure out the contextually best way to measure the distance between
the observation and the centroids at a node. These comments were both getting at the same
point.
But what is the right measure of distance between two spatial patterns? Ultimately, the right
measure is: the probability that the two patterns A and B can be used in the same way. That
is: the system wants to identify observation A with centroid B if it has useful action-patterns
involving B, and it can substitute A for B in these patterns without lass.
This is difficult to calculate in general, though - a rough proxy, which it seems will often be
acceptable, is to measure the distance between A and B in terms of both
• the basic (extensional) distance between the physical patterns they embody (e.g. pixel by
pixel distance)
• the contextual (intensional) distance, i.e. the difference between the contexts in which they
occur
EFTA00624312
166
28 Making DeSTIN Representationally Transparent
Via enabling the belief in a node's parent to play a role in modulating a certain node's be-
lief, DeSTIN's core algorithm enables contextual/intensional factors to play a role in distance
assessment.
28.5 Benefits and Costs of Uniform DeSTIN
We now summarize the main benefits and casts of Uniform DeSTIN a little more systematically.
The key point we have made here regarding Uniform DeSTIN and representational transparency
may be summarized as follows:
• Define an "affine perceptual equivalence class" as a set of percepts that are equivalent to
each other, or nearly so, under affine transformation. An example would be views of the
same object from different perspectives or distances.
• Suppose one has an embodied agent using DeSTIN for visual perception, whose perceptual
stream tends to include a lot of reasonably large affine perceptual equivalence classes.
• Then, supposing the "mechanics" of DeSTIN can be transferred to the Uniform DeSTIN
case without dramatic loss of performance, Uniform DeSTIN should be able to recognize
patterns based on many fewer examples than classic DeSTIN.
As soon as Uniform DeSTIN has learned to recognize one element of a given affine perceptual
equivalence class, it can recognize all of them. Whereas, classic DeSTIN must learn each element
of the equivalence class separately. So, roughly speaking, the number of cases required for
unsupervised training of Uniform DeSTIN will be less than that for classic DeSTIN, by a ratio
equal to the average size of the affine perceptual equivalence classes in the agent's perceptual
stream.
Counterbalancing this, we have the performance cost of comparing the input to each node
against a much larger set of centroids (in Uniform DeSTIN as opposed to classic DeSTIN).
However, if a cover tree or other efficient data structure is used, this cost is not so onerous.
The cost of nearest neighbor queries in a cover tree storing n items (in this case, n centroids) is
0(cn logn), where the constant c represents the "intrinsic dimensionality" of the data; and in
practice the cover tree search algorithm seems to perform quite well. So, the added time cost
for online clustering in Uniform DeSTIN as opposed to DeSTIN, is a factor on the order of the
log of the number of nodes in the DeSTIN tree. We believe this moderate added time cost is
well worth paying, to gain a significant decrease in the number of training examples required
for unsupervised learning.
Beyond increases in computational cost, there is also the risk that the online clustering
may just not work as well when one has so many clusters in each node. This is the sort of
problem that can really only be identified, and dealt with. during extensive practice - since the
performance of any clustering algorithm is largely determined by the specific distribution of
the data it's dealing with. It may be necessary to improve DeSTIN's online clustering in some
way to make Uniform DeSTIN work optimally, e.g. improving its ability to form clusters with
markedly non-spherical shapes. This ties in to a point raised in chapter 29 - the possibility of
supplementing traditional clusters with predicates learned by CogPrime, which may live inside
DeSTIN nodes alongside centroids. Each such predicate in effect defines a (generally nonconvex)
"cluster".
EFTA00624313
28.6 Imprecise Probability as a Tool for Linking CogPrime and DeSTIN
167
28.6 Imprecise Probability as a Tool for Linking CogPrime and
DeSTIN
One key aspect of vision processing is the ability to preferentially focus attention on certain
positions within a perceived visual scene. In this section we describe a novel strategy for enabling
this in a hybrid CogPrime /DeSTIN system, via use of imprecise probabilities. In fact the basic
idea suggested here applies to any probabilistic sensory system, whether deep-learning-based
or not, and whether oriented toward vision or some other sensory modality. However, for sake
of concreteness, we will focus here on the case of DeSTIN/CogPrime integration.
28.6.1 Visual Attention Focusing
Since visual input streams contain vast amounts of data, it's beneficial for a vision system to
be able to focus its attention specifically on the most important parts of its input. Sometimes
knowledge of what's important will come from cognition and long-term memory, but sometimes
it may come from mathematical heuristics applied to the visual data itself.
In the human visual system the latter kind of "low level attention focusing" is achieved
largely in the context of the eye changing its focus frequently, looking preferentially at certain
positions in the scene rint091. This works because the center of the eye corresponds to a greater
density of neurons than the periphery.
So for example, consider a computer vision algorithm like SIFT (Scale-Invariant Feature
Extraction) ILow991, which (as shown in Figure 28.1) mathematically isolates certain points in
a visual scene as "keypoints" which are particularly important for identifying what the scene
depicts (e.g. these may be corners, or easily identifiable curves in edges). The human eye, when
looking at a scene, would probably spend a greater percentage of its time focusing on the SIFT
keypoints than on random points in the image.
The human visual system's strategy for low-level attention focusing is obviously workable
(at least in contexts similar to those in which the human eye evolved), but it's also somewhat
complex, requiring the use of subtle temporal processing to interpret even static scenes. We
suggest here that there may be a simpler way to achieve the same thing, in the context of
vision systems that are substantially probabilistic in nature, via using imprecise probabilities.
The crux of the idea is to represent the most important data, e.g. keypoints, using imprecise
probability values with greater confidence.
Similarly, cognition-guided visual attention-focusing occurs when a mind's broader knowledge
of the world tells it that certain parts of the visual input may be more interesting to study than
others. For example, in a picture of a person walking down a dark street, the contours of the
person may not be tremendously striking visually (according to SIFT or similar approaches);
but even so, if the system as a whole knows that it's looking at a person, it may decide to focus
extra visual attention on anything person-like. This sort of cognition guided visual attention
focusing, we suggest, may be achieved similarly to visual attention focusing guided on lower-
level cues - by increasing the confidence of the imprecise probabilities associated with those
aspects of the input that are judged more cognitively significant.
EFTA00624314
168
28 Making DeSTIN Representationally Transparent
28.6.2 Using Imprecise Probabilities to Guide Visual Attention
Focusing
Suppose one has a vision system that internally constructs probabilistic values corresponding to
small local regions in visual input (these could be pixels or voxels, or something a little larger),
and then (perhaps via a complex process) assigns probabilities to different interpretations of the
input based on combinations of these input-level probabilities. For this sort of vision system,
one may be able to achieve focusing of attention via appropriately replacing the probabilities
with imprecise probabilities. Such an approach may be especially interesting in hierarchical
vision systems, that also involve the calculation of probabilities corresponding to larger regions
of the visual input. Examples of the latter include deep learning based vision systems like Hill
or DeSTIN, which construct nested hierarchies corresponding to larger and larger regions of the
input space, and calculate probabilities associated with each of the regions on each level, based
in part on the probabilities associated with other related regions.
In this context, we now state the basic suggestion of the section:
1. Assign higher confidence to the low-level probabilities that the vision system creates cor-
responding to the local visual regions that one wants to focus attention on (based on cues
from visual preprocessing or cognitive guidance)
2. Carry out the vision system's processing using imprecise probabilities rather than single-
number probabilities
3. Wherever the vision system makes a decision based on "the most probable choice" from a
number of possibilities, change the system to make a decision based on "the choice maxi-
mizing the product (expectation * confidence)".
28.6.3 Sketch of Application to DeSTIN
Internally to DeSTIN, probabilities are assigned to clusters associated with local regions of the
visual input. If a system such as SIFT is run as a preprocessor to DeSTIN, then those small
regions corresponding to SIFT keypoints may be assumed semantically meaningful. and internal
DeSTIN probabilities associated with them can be given a high confidence. A similar strategy
may be taken if a cognitive system such as OpenCog is run together with DeSTIN, feeding
DeSTIN information on which portions of a partially-processed image appear most cognitively
relevant. The probabilistic calculations inside DeSTIN can be replaced with corresponding cal-
culations involving imprecise probabilities. And critically, there is a step in DeSTIN where,
among a set of beliefs about the state in each region of an image (on each of a set of hierarchi-
cal levels), the one with the highest probability is selected. In accordance with the above recipe,
this step should be modified to select the belief with the highest probability * confidence.
28.6.3.1 Conceptual Justification
What is the conceptual justification for this approach?
One justification is obtained by assuming that each percept has a certain probability of being
erroneous, and those percepts that appear to more closely embody the semantic meaning of the
EFTA00624315
28.6 Imprecise Probability as a Tool for Linking CogPrime and DeSTIN
169
visual scene are less likely to be erroneous. This follows conceptually from the assumption that
the perceived world tends to be patterned and structured, so that being part of a statistically
significant pattern is (perhaps weak) evidence of being real rather than artifactual. Under this
assumption, the proposed approach will maximize the accuracy of the system's judgments.
A related justification is obtained by observing that this algorithmic approach follows from
the consideration of the perceived world as mutable. Consider a vision system that has the
capability to modify even the low-level percepts that it intakes - i.e. to use what it thinks
and knows, to modify what it sees. The human brain certainly has this potential ICha091. In
this case, it will make sense for the system to place sonic constraints regarding which of its
percepts it is more likely to modify. Confidence values semantically embody this - a higher
confidence being sensibly assigned to percepts that the system considers should be less likely to
be modified based on feedback from its higher (more cognitive) processing levels. In that case,
a higher confidence should be given to those percepts that seem to more closely embody the
semantic meaning of the visual scene - which is exactly what we're suggesting here.
28.6.3.2 Enabling Visual Attention Focusing in DeSTIN via Imprecise
Probabilities
We now refer back to the mathematical formulation of DeSTIN summarized in Section 4.3.1
of Chapter 4 above, in the context of which the application of imprecise probability based
attention focusing to DeSTIN is almost immediate.
The probabilities P(ols) may be assigned greater or lesser confidence depending on the
assessed semantic criticality of the observation o in question. So for instance, if one is using
SIFT as a preprocessor to DeSTIN, then one may assign probabilities P(ols) higher confidence
if they correspond to observations o of SIFT keypoints, than if they do not.
These confidence levels may then be propagated throughout DeSTIN's probabilistic mathe-
matics. For instance, if one were using \Valley's interval probabilities, then one could carry out
the probabilistic equations using interval arithmetic.
Finally, one wishes to replace Equation 4.3.1.2 in Chapter 4 with
c = arg max ((bp(s)).strength (bp(s)).confidence) ,
(28.1)
or some similar variant. The effect of this is that hypotheses based on high-confidence observa-
tions are more likely to be chosen, which of course has a large impact on the dynamics of the
DeSTIN network.
EFTA00624316
170
28 Making DeSTIN Representationally Transparent
Fig. 28.1: The SIFT algorithm finds keypoints in an image, i.e. localized features that are
particularly useful for identifying the objects in an image. The top row shows images that
are matched against the image in the middle row. The bottom-row image shows some of the
keypoints used to perform the matching (i.e. these keypoints demonstrate the same features
in the top-row images and their transformed middle-row counterparts). SIFT keypoints are
identified via a staged filtering approach. The first stage identifies key locations in scale space
by looking for locations that are maxima or minima of a difference-of-Gaussian function. Each
point is used to generate a feature vector that describes the local image region sampled relative
to its scale-space coordinate frame. The features achieve partial invariance to local variations,
such as affine or 3D projections, by blurring image gradient locations.
EFTA00624317
Chapter 29
Bridging the Symbolic/Subsymbolic Gap
29.1 Introduction
While it's widely accepted that human beings carry out both symbolic and subsymbolic process-
ing, as integral parts of their general intelligence, the precise definition of "symbolic" versus
"subsymbolic" is a subtle issue, which different Al researchers will approach in different ways
depending on their differing overall perspectives on AI. Nevertheless, the intuitive meaning of
the concepts is commonly understood:
• "subsymbolic" refers to things like pattern recognition in high-dimensional quantitative
sensory data, and real-time coordination of multiple actuators taking multidimensional
control signals
• "symbolic" refers to things like natural language grammar and (certain or uncertain)
logical reasoning, that are naturally modeled in terms of manipulation of symbolic tokens
in terms of particular (perhaps experientially learned) rules
Views on the relationship between these two aspects of intelligence in human and artificial
cognition are quite diverse, including perspectives such as
1. Symbolic representation and reasoning are the core of human-level intelligence; subsymbolic
aspects of intelligence are of secondary importance and can be thought of as pre- or post-
processors to symbolic representation and reasoning
2. Subsymbolic representation and learning are the core of human intelligence; symbolic as-
pects of intelligence
a. emerge from the subsymbolic aspects as needed; or,
b. arise via a relatively simple, thin layer on top of subsymbolic intelligence, that merely
applies subsymbolic intelligence in a slightly different way
3. Symbolic and subsymbolic aspects of intelligence are best considered as different subsystems,
which
a. have a significant degree of independent operation, but also need to coordinate closely
together; or,
b. operate largely separately and can be mostly considered as discrete modules
171
EFTA00624318
172
29 Bridging the Symbolic/Subsymbolic Gap
In evolutionary, terms, it is clear that subsymbolic intelligence came first, and that most of
the human brain is concerned with the subsymbolic intelligence that humans share with other
animals. However, this observation doesn't have clear implications regarding the relationship
between symbolic and subsymbolic intelligence in the context of everyday cognition.
In the history of the Al field, the symbolic/subsymbolic distinction was sometimes aligned
with the dichotomy between logic-based and rule-based AI systems (on the symbolic side) and
neural networks (on the subsymbolic side) IP.188bl. However, this dichotomy has become much
blurrier in the last couple decades, with developments such as neural network models of lan-
guage parsing IC Hilf and logical reasoning 11.Bi
Oh and symbolic approaches to perception
and action ISR0-1]. Integrative approaches have also become more common, with one of the ma-
jor traditional symbolic AI systems, ACT-R, spawning a neural network version II,A931 with
parallel structures and dynamics to the traditional explicitly symbolic version and a hybridiza-
tion with a computational neuroscience model 1.11.08; and another one, SOAR, incorporating
perception processing components as separate modules ILait2l. The field of "neural-symbolic
computing" has emerged, covering the emergence of symbolic rules from neural networks, and
the hybridization of neural networks with explicitly symbolic systems [HH07].
Our goal here is not to explore the numerous deep issues involved with the symbolic/subsym-
bolic dichotomy, but rather to describe the details of a particular approach to symbolic/sub-
symbolic integration, inspired by Perspective 3a in the above list: the consideration of symbolic
and subsymbolic aspects of intelligence as different subcystems, which have a significant degree
of independent operation, but also need to coordinate closely together. We believe this kind
of integration can serve a key role in the quest to create human-level general intelligence. The
approach presented here is at the beginning rather than end of its practical implementation;
what we are describing here is the initial design intention of a project in progress, which is sure
to be revised in some respects as implementation and testing proceed. We will focus mainly
on the tight integration of a subsymbolic system enabling gray-scale vision processing into a
cognitive architecture with significant symbolic aspects, and will then briefly explain how the
same ideas can be used for color vision, and multi-sensory and perception-action integration.
The approach presented here begins with two separate Al systems, OpenCog (introduced
in Chapter 6.3) and DeSTIN (introduced in Chapter 4.3.1) - both currently implemented in
open-source software. Here are the relevant features of each as they pertain to our current effort
of bridging the symbolic/subsymbolic gap::
• OpenCog, is centered on a "weighted, labeled hypergraph" knowledge representation called
the Atomspace. and features a number of different, sophisticated cognitive algorithms acting
on the Atomspace. Some of these cognitive algorithms are heavily symbolic in focus (e.g.
a probabilistic logic engine); others are more subsymbolic in nature (e.g. a neural net like
system for allocating attention and assigning credit). However, OpenCog in its current form
cannot deal with high-dimensional perceptual input, nor with detailed real-time control
of complex actuators. OpenCog is now being used to control intelligent characters in an
experimental virtual world, where the perceptual inputs are the 3D coordinate locations of
objects or small blocks; and the actions are movement commands like "step forward", "turn
head to the right."
• DeSTIN is a deep learning system consisting of a hierarchy of processing nodes, in which
the nodes on higher levels correspond to larger regions of space-time, and each node carries
out prediction regarding events in the space-time region to which it corresponds. Feedback
and feedforward dynamics between nodes combine with the predictive activity within nodes,
to create a complex nonlinear dynamical system whose state self-organizes to reflect the
EFTA00624319
29.2 Simplified OpenCog Workflow
173
state of the world being perceived. However, the specifics of DeSTIN's dynamics have been
designed in what we consider a particularly powerful way, and the system has shown good
results on small-scale test problems IKAII10]. So far DeSTIN has been utilized only for
vision processing, but a similar proprietary system has been used for auditory data as well;
and DeSTIN was designed to work together with an accompanying action hierarchy.
These two systems were not originally designed to work together, but we will describe a method
for achieving their tight integration via
1. Modifying DeSTIN in several ways, so that
a. the patterns in its states over time will have more easily recognizable regularities
b. its nodes are able to scan their inputs not only for simple statistical patterns (DeSTIN
"centroids"), but also for patterns recognized by routines supplied to it by an external
source (e.g. another AI system such as OpenCog)
2. Utilizing one of OpenCogPrime's cognitive processes (the "Fishgram" frequent subhyper-
graph mining algorithm) to recognize patterns in sets of DeSTIN states, and then recording
these patterns in OpenCogPrime's Atomspace knowledge store
3. Utilizing OpenCogPrime's other cognitive processes to abstract concepts and draw conclu-
sions from the patterns recognized in DeSTIN states by Fishgram
4. Exporting the concepts and conclusions thus formed to DeSTIN, so that its nodes can ex-
plicitly scan for their presence in their inputs, thus allowing the results of symbolic cognition
to explicitly guide subsymbolic perception
5. Creating an action hierarchy corresponding closely to DeSTIN's perceptual hierarchy, and
also corresponding to the actuators of a particular robot. This allows action learning to be
done via an optimization approach (ILKP- 05j, IYKL+0-11), where the optimization algo-
rithm uses DeSTIN states corresponding to perceived actuator states as part of its inputs.
The ideas presented here are compatible with those described in [Coe] lag, but different in
emphasis. That paper described a strategy for integrating OpenCog and DeSTIN via creating an
intermediate "semantic CSDLN" hierarchy to translate between OpenCog and DeSTIN, in both
directions. In the approach suggested here, this semantic CSDLN hierarchy exists conceptually
but not as a separate software object: it exists as the combination of
• OpenCog predicates exported to DeSTIN and used alongside DeSTIN centroids, inside
DeSTIN nodes
• OpenCog predicates living in the OpenCog knowledge repository (AtomSpace), and inter-
connected in a hierarchical way using OpenCog nodes and links (thus reflecting DeSTIN's
hierarchical structure within the AtomSpace).
This hierarchical network of predicates, spanning the two software systems, plays the role of a
semantic CSDLN as described in ICoel la].
29.2 Simplified OpenCog Workflow
The dynamics inside an OpenCog system may be highly complex, defying simple flowchart-
ing, but from the point of view of OpenCog-DeSTIN integration, one important pattern of
information flow through the system is as follows:
EFTA00624320
174
29 Bridging the Symbolic/Subsymbolic Gap
1. Perceptions come into the Atomspace. In the current OpenCog system, these are provided
via a proxy to the game engine where the OpenCog controlled character interacts. In an
OpenCog-DeSTIN hybrid, these will be provided via DeSTIN.
2. Hebbian learning builds HebbianLinks between perceptual Atoms representing percepts
that have frequently co-occurred
3. PLN inference, concept blending and other methods act on these perceptual Atoms and
their HebbianLinks, forming links between them and linking them to other Atoms stored
in the Atomspace reflecting prior experience and generalizations therefrom
4. Attention allocation gives higher short and long term importance values to those Atoms
that appear likely to be useful based on the links they have obtained
5. Based on the system's current goals and subgoals (the latter learned from the top-level goals
using PLN), and the goal-related links in the Atomspace, the OpenPsi mechanism triggers
the PLN-based planner, which chooses a series of high-level actions that are judged likely
to help the system achieve its goals in the current context
6. The chosen high-level actions are transformed into series of lower-level, directly executable
actions. In the current OpenCog system, this is done by a set of hand-coded rules based
on the specific mechanics of the game engine where the OpenCog controlled character
interacts. In an OpenCog-DeSTIN hybrid, the lower-level action sequence will be chosen by
an optimization method acting based on the motor control and perceptual hierarchies.
This pattern of information flow omits numerous aspects of OpenCog cognitive dynamics,
but gives the key parts of the picture in terms of the interaction of OpenCog cognition with
perception and action. Most of the other aspects of the dynamics have to do with the interac-
tion of multiple cognitive processes acting on the Atomspace, and the interaction between the
Atomspace and several associated specialized memory stores, dealing with procedural. episodic,
temporal and spatial aspects of knowledge. From the present point of view, these additional
aspects may be viewed as part of Step 3 above, wrapped up in the phrase "and other methods
act on these perceptual Atoms." However, it's worth noting that in order to act appropriately
on perceptual Atoms, a lot of background cognition regarding more abstract conceptual Atoms
(often generalized from previous perceptual Atoms) may be drawn on. This background infer-
ence incorporates both symbolic and subsymbolic aspects, but goes beyond the scope of the
present discussion, as its particulars do not impinge on the particulars of DeSTIN-OpenCog
integration.
OpenCog also possesses a specialized facility for natural language comprehension and genera-
tion'LCElOJ V;oelObj, which may be viewed as a parallel perception/action pathway, bypassing
traditional human-like sense perception and dealing with text directly. Integrating OpenCog-
Prime's current linguistics processes with DeSTIN-based auditory and visual processing is a
deep and important topic, but one we will bypass here, for sake of brevity and because it's not
our current research priority.
29.3 Integrating DeSTIN and OpenCog
The integration of DeSTIN and OpenCog involves two key aspects:
• recognition of patterns in sets of DeSTIN states, and exportation of these patterns into the
OpenCog Atomspace
EFTA00624321
29.3 Integrating DeSTIN and OpenCog
175
• use of OpenCog-created concepts within DeSTIN nodes, alongside statistically-derived "cen-
troids"
From here on, unless specified otherwise, when we mention "DeSTIN" we will refer to "Uniform
DeSTIN" as presented in Chapter 28 and an extension of "classic DeSTIN" as defined in
IA K
29.3.1 Mining Patterns from DeSTIN States
The first step toward using OpenCog tools to mine patterns from sets of DeSTIN states, is to
represent these states in Atom form in an appropriate way. A simple but workable approach,
restricting attention for the moment to purely spatial patterns, is to use the six predicates:
• hasCentroid(node N,int k)
• hosParentCentroid(node N,int k)
• hasNorthNeighborCentrold(node N,int k)
• hasSouthNeighborCentrold(node N,int k)
• hasEastNeighborCentroid(node N, int k)
• hasWestNeighborCentroid(node N, ird
For instance
hasNorthNeighborCentroid(N, 3)
means that N's north neighbor has centroid #3
One may consider also the predicates
• hosParent(node N, Node
• hasNorthNeighbor(node N, Node Ail)
• hasSouthNeighbor(node N, Node M)
• hasEastNeighbor(node N, Node M)
• hasWestNeighbor(node N, Node Al)
Now suppose we have a stored set of DeSTIN states, saved from the application of DeSTIN
to multiple different inputs. What we want to find are predicates P that are conjunctions of
instances of the above 10 predicates, which occur frequently in the stored set of DeSTIN states.
A simple example of such a predicate would be the conjunction of
• hosNorthNeighbor(SN,SAI)
• hosParentCentroid($N, 5)
• hosParentCentroid($M, 5)
• hasNorthNeighborCentrold(SN, 6)
• hasWestNeighboreentroid(SAI, 4)
This predicate could be evaluated at any pair of nodes ($N, $M) on the same DeSTIN level.
If it is true for atypically many of these pairs, then it's a "frequent pattern", and should be
detected and stored.
EFTA00624322
176
29 Bridging the Symbolic/Subsymbolic Gap
OpenCogPrime's pattern mining component, Fishgram, exists precisely for the purpose of
mining this sort of conjunction from sets of relationships that are stored in the Atonispace. It
may be applied to this problem as follows:
• Translate each DeSTIN state into a set of relationships drawn from: hasNorthNeighbor,
ha.sSouthNeighbor, hasEastNeighbor, hasWestNeighbor, hasCentroid, hasParent
• Import these relationships, describing each DeSTIN state, into the OpenCog Atomspace
• Run pattern mining on this AtomSpace.
29.3.2 Probabilistic Inference on Mined Hypergraphs
Patterns mined from DeSTIN states can then be reasoned on by OpenCogPrime's PLN inference
engine, allowing analogy and generalization.
Suppose centroids 5 and 617 are estimated to be similar - either via DeSTIN's built-in simi-
larity metric, or, more interestingly via OpenCog inference on the Atom representations of these
centroids. As an example of the latter, consider: 5 could represent a person's nose and 617 could
represent a rabbit's nose. In this case, DeSTIN might not judge the two centroids particularly
similar on a purely visual level, but, OpenCog may know that the images corresponding to
both of these centroids are are called "noses" (e.g. perhaps via noticing people indicate these
images in association with the word "nose"), and may thus infer (using a simple chain of PLN
inferences) that these centroids seem probabilistically similar.
If 5 and 617 are estimated to be similar, then a predicate like
ANDLink
EvaluationLink
hasNorthNeighbor
ListLink $N
$M
EvaluationLink
hasParentCentroid
ListLink $N
5
EvaluationLink
hasParentCentroid
ListLink $M
5
EvaluationLink
hasNorthNeighborCentroid
ListLink $N 6
EvaluationLink
hasWestNeighborCentroid
ListLink $M 4
mined from DeSTIN states, could be extended via PLN analogical reasoning to
ANDLink
EvaluationLink
hasNorthNeighbor
ListLink $N $M
EvaluationLink
EFTA00624323
29.3 Integrating DeSTIN and OpenCog
ITT
hasParentCentroid
ListLink $N
617
EvaluationLink
hasParentCentroid
ListLink $M 617
EvaluationLink
hasNorthNeighborCentroid
ListLink $N 6
EvaluationLink
hasWestNeighborCentroid
ListLink $M 4
29.3.3 Insertion of OpenCog-Learned Predicates into DeSTIN's
Pattern Library
Suppose one has used Fishgram, as described in the earlier part of this chapter, to recog-
nize predicates embodying frequent or surprising patterns in a set of DeSTIN states or state-
sequences. The next natural step is to add these frequent or surprising patterns to DeSTIN's
pattern library, so that the pattern library contains not only classic DeSTIN centroids, but also
these corresponding "image grammar" style patterns. Then, when a new input comes into a
DeSTIN node, in addition to being compared to the centroids at the node, it can be fed as
input to the predicates associated with the node.
What is the advantage of this approach, compared to DeSTIN without these predicates? The
capability for more compact representation of a variety of spatial patterns. In many cases, a
spatial pattern that would require a large number of DeSTIN centroids to represent, can be
represented by a single, fairly compact predicate. It is an open question whether these sorts
of predicates are really critical for human-like vision processing. However, our intuition is that
they do have a role in human as well as machine vision. In essence, DeSTIN is based on a fancy
version of nearest-neighbor search, applied in a clever way on multiple levels of a hierarchy,
using context-savvy probabilities to bias the matching. But we suspect there are many visual
patterns that are more compactly and intuitively represented using a more flexible language,
such as OpenCog predicates formed by combining elementary predicates involving appropriate
spatial and temporal relations.
For example, consider the archetypal spatial pattern of a face as: either two eyes that are
next to each other, or sunglasses, above a nose, which is in turn above a mouth. (This is an
oversimplified toy example, but we're positing it for illustration only. The same point applies
to more complex and realistic patterns.) One could represent this in OpenCogPrime's Atom
language as something like:
AND
InheritanceLink N B_nose
InheritanceLink M B_mouth
EvaluationLink
above
ListLink E N
EvaluationLink
EFTA00624324
178
29 Bridging the Symbolic/Subsymbolic Gap
above
ListLink N M
OR
AND
MemberLink El E
MemberLink E2
E
EvaluationLink
next_to
ListLink El E2
InheritanceLink El B_eye
AND
InheritanceLink E B_sunglasses
where e.g. B_eye is a DeSTIN belief that corresponds roughly to recognition of the spatial
pattern of a human eye. To represent this using ordinary DeSTIN centroids, one couldn't rep-
resent the OR explicitly; instead one would need to split it into two different sets of centroids,
corresponding to the eye case and the sunglasses case - unless the DeSTIN pattern library
contained a belief corresponding to "eyes or sunglasses." But the question then becomes: how
would classic DeSTIN actually learn a belief like this? In the suggested architecture, pattern
mining on the database of DeSTIN states is proposed as an algorithm for learning such beliefs.
This sort of predicate-enhanced DeSTIN will have advantages over the traditional version,
only if the actual distribution of images observed by the system contains many (reasonably high
probability) images modeled accurately by predicates involving disjunction and/or negations as
well as conjunctions. If the system's perceived world is simpler than this, then good old DeSTIN
will work just as well, and the OpenCog-learned predicates are a needless complication.
Without these sorts of predicates, how might DeSTIN be extended to include beliefs like "eyes
or sunglasses"? One way would be to couple DeSTIN with a reinforcement learning subsystem,
that reinforced the creation of beliefs that were useful for the system as a whole. If reasoning
in terms of faces (independent of whether they have eyes or sunglasses) got the system reward,
presumably it could learn to form the concept "eyes or sunglasses." We believe this would also
be a workable approach, but that given the strengths and weaknesses of contemporary computer
hardware, the proposed DeSTIN-OpenCog approach will prove considerably simpler and more
effective.
29.4 Multisensory Integration, and Perception-Action Integration
In Chapter 28.2.1 we have briefly indicated how DeSTIN could be extended beyond vision
to handle other senses such as audition and touch. If one had multiple perception hierarchies
corresponding to multiple senses, the easiest way to integrate them within an OpenCog context
would be to use OpenCog as the communication nexus - representing DeSTIN centroids in
the various modality-specific hierarchies as OpenCog Atoms (PerceptualCentroidNodes), and
building HebbianLinks in OpenCogPrime's Atomspace between these PerceptualCentroidNodes
as appropriate based on their association. So for instance the sound of a person's footsteps
would correspond to a certain belief (probability distribution over centroids) in the auditory
DeSTIN network, and the sight of a person's feet stepping would correspond to a certain
belief (probability distribution over centroids) in the visual DeSTIN network; and the OpenCog
EFTA00624325
29.4 Multisensory Integration, and Perception-Action Integration
179
Atomspace would contain links between the sets of centroids assigned high weights between these
two belief distributions. Importance spreading between these various PerceptualCentroidNodes
would cause a dynamic wherein seeing feet stepping would bias the system to think it was
hearing footsteps, and hearing footsteps would bias it to think it was seeing feet stepping.
And, suppose there are similarities between the belief distributions for the visual appearance
of dogs, and the visual appearance of cats. Via the intermediary of the Atomspace, this would
bias the auditory and haptic DeSTIN hierarchies to assume a similarity between the auditory
and haptic characteristics of dogs, and the analogous characteristics of cats. Because: PLN
analogical reasoning would extrapolate from, e.g.
• HebbianLinks joining cat-related visual PerceptualCentroidNodes and dog-related visual
PerceptualCentroidNodm
• HebbianLinks joining cat-related visual PerceptualCentroidNodes to cat-related haptic Per-
ceptualCentroidNodes; and others joining dog-related visual PerceptualCentroidNodes to
dog-related haptic PerceptualCentroidNodes
to yield HebbianLinks joining cat-related haptic PerceptualCentroidNodes and dog-related hap-
tic PerceptualCentroidNodes. This sort of reasoning would then cause the system DeSTIN to,
for example, upon touching a cat, vaguely expect to maybe hear dog-like things. This sort
of simple analogical reasoning will be right sometimes and wrong sometimes - a cat walking
sounds a fair bit like a dog walking, and cat and dog growls sound fairly similar, but a cat
meowing doesn't sound that much like a dog barking. More refined inferences of the same basic
sort may be used to get the details right as the system explores and understands the world
more accurately.
29.4.1 Perception-Action Integration
While experimentation with DeSTIN has so far been restricted to perception processing, the sys-
tem was designed from the beginning with robotics applications in mind, involving integration
of perception with action and reinforcement learning. As OpenCog already handles reinforce-
ment learning on a high level (via OpenPsi), our approach to robot control using DeSTIN and
OpenCog involves creating a control hierarchy parallel to DeSTIN's perceptual hierarchy, and
doing motor learning using optimization algorithms guided by reinforcement signals delivered
from OpenPsi and incorporating DeSTIN perceptual states as part of their input information.
Our initial research goal, where action is concerned, is not to equal the best purely control-
theoretic algorithms at fine-grained control of robots carrying out specialized tasks, but rather
to achieve basic perception / control / cognition integration in the rough manner of a young
human child. A two year old child is not particularly well coordinated, but is capable of coor-
dinating actions involving multiple body parts using an integration of perception and action
with unconscious and deliberative reasoning. Current robots, in some cases, can carry out spe-
cialized actions with great accuracy, but they lack this sort of integration, and thus generally
have difficulty effectively carrying out actions in unforeseen environments and circumstances.
We will create an action hierarchy with nodes corresponding to different parts of the robot
body, where e.g. the node corresponding to an arm would have child nodes corresponding to
a shoulder, elbow, wrist and hand; and the node corresponding to a hand would have child
nodes corresponding to the fingers of the hand; etc. Physical self-perception is then achieved
EFTA00624326
ISO
29 Bridging the Symbolic/Subsymbolic Cap
by creating a DeSTIN "action-perception" hierarchy with nodes corresponding to the states
of body components. In the simplest case this means the lowest-level nodes will correspond to
individual servomotors, and their inputs will be numerical vectors characterizing servomotor
states. If one is dealing with a robot endowed with haptic technology. e.g. Syntouch
fingertips, then numerical vectors characterizing haptic inputs may be used alongside these.
The configuration space of an action-perception node, corresponding to the degrees of free-
dom of the servomotors of the body part the node represents, may be approximated by a set of
"centroid" vectors. When an action is learned by the optimization method used for this purpose,
this involves movements of the servomotors corresponding to many different nodes, and thus
creates a series of "configuration vectors" in each node. These configuration vector series may
be subjected to online clustering, similar to percepts in a DeSTIN perceptual hierarchy. The
result is a library of "codewords", corresponding to discrete trajectories of movement, associ-
ated with each node. The libraries may be shared by identical body parts (e.g. shared among
legs, shared among fingers), but will be distinct otherwise. Each coordinated whole-body action
thus results in a series of (node, centroid) pairs, which may be mined for patterns, similarly to
the perception case.
The set of predicates needed to characterize states in this action-perception hierarchy is
simpler than the one described for visual perception above; here one requires only
• haseentroid(node N,int k)
• hosParentCentroid(node N, Mt k)
• hasParent(node N, Node M)
• hasSibling(node N, Node M)
and most of the patterns will involve specific nodes rather than node variables. The different
nodes in a DeSTIN vision hierarchy are more interchangeable (in terms of their involvement in
various patterns) than, say, a leg and a finger.
In a pure DeSTIN implementation, the visual and action-perception hierarchies would be
directly linked. In the context of OpenCog integration, it is simplest to link the two via OpenCog,
in a sense using cognition as a bridge between action and perception. It is unclear whether this
strategy will be sufficient in the long run, but we believe it will be more than adequate for
experimentation with robotic perceptual-motor coordination in a variety of everyday tasks.
OpenCogPrime's Hebbian learning process can be used to find common associations between
action-perception states and visual-perception states, via mining a data store containing time-
stamped state records from both hierarchies.
Importance spreading along the HebbianLinIcs learned in this way can then be used to bias
the weights in the belief states of the nodes in both hierarchies. So, for example, the action-
perception patterns related to clenching the fist, would be Hebbianly correlated with the visual-
perception patterns related to seeing a clenched fist. When a clenched fist was perceived via
servomotor data. importance spreading would increase the weighting of visual patterns corre-
sponding to clenched fists, within the visual hierarchy. When a clenched fist was perceived via
visual data, importance spreading would increase the weighting of servomotor data patterns
corresponding to clenched fists, within the action-perception hierarchy.
EFTA00624327
29.4 Multisensory Integration, and Perception-Action Integration
181
29.4.2 Thought-Experiment: Eye-Hand Coordination
For example, how would DeSTIN-OpenCog integration as described here carry out a simple
task of eye-hand coordination? Of course the details of such a feat, as actually achieved, would
be too intricate to describe in a brief space, but it still is meaningful to describe the basic ideas.
Consider the case of a robot picking up a block, in plain sight immediately in front of the robot,
via pinching it between two fingers and then lifting it. In this case,
• The visual scene, including the block, is perceived by DeSTIN; and appropriate patterns in
various DeSTIN nodes are formed
• Predicates corresponding to the distribution of patterns among DeSTIN nodes are activated
and exported to the OpenCog Atomspace
• Recognition that a block is present is carried out, either by
- PLN inference within OpenCog, drawing the conclusion that a block is present from
the exported predicates, using ImplicationLinks comprising a working definition of a
"block"
- A predicate comprising the definition of "block", previously imported into DeSTIN from
OpenCog and utilized within DeSTIN nodes as a basic pattern to be scanned for. This
option would obtain only if the system had perceived many blocks in the past, justifying
the automation of block recognition within the perceptual hierarchy.
• OpenCog, motivated by one of its higher-level goals, chooses "picking up the block" as
subgoal. So it allocates effort to finding a procedure whose execution, in the current context,
has a reasonable likelihood of achieving the goal of picking up the block. For instance, the
goal could be curiosity (which might make the robot want to see what lies under the block),
or the desire to please the agent's human teacher (in case the human teacher likes presents,
and will reward the robot for giving it a blo
Document Preview
PDF source document
This document was extracted from a PDF. No image preview is available. The OCR text is shown on the left.
This document was extracted from a PDF. No image preview is available. The OCR text is shown on the left.
Extracted Information
Document Details
| Filename | EFTA00624128.pdf |
| File Size | 34668.2 KB |
| OCR Confidence | 85.0% |
| Has Readable Text | Yes |
| Text Length | 500,000 characters |
| Indexed | 2026-02-11T23:07:49.218498 |