Home > Videos

๐Ÿง ๐Ÿš€๐Ÿ’กโœจ Inside Pathwayโ€™s Post-Transformer Architecture Designed for Memory and On-the-Fly Learning

๐Ÿค– AI Summary

  • ๐Ÿง  Traditional Large Language Models function like interns on their first day who never gain context or improve with experience because they lack inherent memory [12:03].
  • ๐Ÿงฌ The Baby Dragon Hatchling (BDH) architecture moves away from the transformerโ€™s dense connectivity toward a graph structure that mimics organic brain locality [15:22].
  • โšก Memory in this post-transformer system sits on the fast weights of edges rather than just within fixed parameters, allowing for on-the-fly learning during inference [23:34].
  • ๐Ÿ•ธ๏ธ Unlike transformers where everyone is connected to everyone, BDH utilizes sparse local interactions where neurons only signal relevant neighbors to save compute [17:11].
  • ๐Ÿ“ˆ This architecture preserves scaling laws without requiring the number of neurons to grow quadratically, making it more sustainable and efficient [39:14].
  • ๐Ÿ› ๏ธ Synaptic plasticity allows the network to strengthen connections based on information relevance, enabling the system to evolve its own internal representations [41:34].
  • ๐Ÿ” Sparsity in the graph provides better interpretability compared to black-box models because specific synapses can be identified as responsible for particular concepts [52:01].
  • ๐Ÿ–‡๏ธ Separate models trained on different domains can be glued together along the neuron dimension to combine expertise, such as merging finance and law [01:03:08].

๐Ÿค” Evaluation

  • โš–๏ธ The speaker focuses on moving beyond the quadratic complexity of transformers, a goal shared by proponents of State Space Models like Mamba. To better understand this shift, one should explore the paper Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention by the Idiap Research Institute.
  • ๐Ÿงฉ While the speaker emphasizes local dynamics and graph-based memory, other researchers suggest that scaling existing architectures remains the most reliable path to AGI. This is discussed in detail in the essay The Bitter Lesson by Richard Sutton from the University of Alberta.

โ“ Frequently Asked Questions (FAQ)

๐Ÿฒ Q: What is the Baby Dragon Hatchling architecture?

๐ŸฆŽ A: It is a post-transformer neural network design that uses graph-based local dynamics and synaptic plasticity to provide models with long-term memory and improved reasoning.

๐Ÿ’พ Q: How does BDH handle memory differently than GPT models?

๐Ÿ“‚ A: While GPT models have a fixed context window and forget previous interactions, BDH updates its fast weights on the edges of its graph during inference to learn and retain information in real-time.

๐Ÿ“‰ Q: Does this new architecture require more computing power?

๐Ÿ”‹ A: No, it is designed for efficiency by using sparse interactions that only activate a small portion of the network at any time, potentially reducing compute needs for reasoning by ten times.

๐Ÿ“š Book Recommendations

โ†”๏ธ Similar

  • ๐Ÿ•ธ๏ธ Linked by Albert-Lรกszlรณ Barabรกsi explores how the structure of complex networks influences everything from biology to the internet.
  • ๐Ÿง  On Intelligence by Jeff Hawkins describes a framework for understanding human intelligence based on the memory-prediction properties of the brain.

๐Ÿ†š Contrasting

  • ๐Ÿง ๐Ÿ’ป๐Ÿค– Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville provides the foundational mathematical principles for the dense, backpropagation-heavy architectures that BDH seeks to improve upon.
  • ๐Ÿ›๏ธ The Master Algorithm by Pedro Domingos examines the different schools of machine learning, including those that prioritize symbolic logic over emergent organic structures.
  • ๐Ÿœ Emergence by Steven Johnson analyzes how simple agents following local rules can create complex, intelligent global systems.
  • ๐ŸŽฒ The Strange Order of Things by Antonio Damasio investigates how feelings and biological homeostasis are the root of human cultural and intellectual creativity.

๐Ÿฆ‹ Bluesky

๐Ÿง ๐Ÿš€๐Ÿ’กโœจ Inside Pathway's Post-Transformer Architecture Designed for Memory and On-the-Fly Learning

AI Q: ๐Ÿง  Can AI learn?

๐Ÿง  Neural Networks | ๐Ÿ•ธ๏ธ Graph Structures | โšก๏ธ Dynamic Memory
https://bagrounds.org/videos/inside-pathways-post-transformer-architecture-designed-for-memory-and-on-the-fly-learning

โ€” Bryan Grounds (@bagrounds.bsky.social) 2026-03-14T00:19:33.703Z

๐Ÿ˜ Mastodon

Post by @bagrounds@mastodon.social
View on Mastodon