Home > People

👨‍💻☁️🐘 Jay Kreps

🚀 Jay Kreps is a highly influential figure in the world of technology, particularly in the realm of data infrastructure and real-time data streaming 🌊. He is primarily known for:

👨‍💻 Co-creating Apache Kafka: While at LinkedIn 💼, Jay Kreps was one of the original authors and key engineers behind Apache Kafka 🐘, an open-source distributed streaming platform. Kafka revolutionized 💡 how companies handle large volumes of real-time data, enabling applications to publish and subscribe to streams of records 🧾.
🏢 Co-founding and serving as CEO of Confluent: Building on the success of Apache Kafka, Jay Kreps co-founded Confluent in 2014 with Jun Rao and Neha Narkhede. Confluent commercializes and extends Apache Kafka, providing an enterprise-ready platform for data in motion 🚄, including managed cloud services ☁️ and tools for stream governance and processing.
💡 Pioneering “Data in Motion”: Kreps has been a strong advocate 📣 for the concept of “data in motion,” emphasizing the importance of processing and analyzing data as it is being generated ⚙️, rather than relying solely on batch processing of static data. This philosophy is central to Confluent’s mission and has shaped how many businesses approach data management and real-time applications 📈, especially with the rise of AI 🤖.
🛠️ Contributions to other open-source projects: Beyond Kafka, he was also involved in the original development of other open-source projects such as Project Voldemort (a key-value store) and Apache Samza (a stream processing system) during his time as lead architect for data infrastructure at LinkedIn.

🎓 Jay Kreps holds both a Bachelor’s and Master’s degree in Computer Science from the University of California, Santa Cruz. 📅 He has also recently joined Anthropic’s Board of Directors in May 2024.

📚 Book Recommendations

📚 Books by Jay Kreps:

❤️ I Heart Logs: Event Data, Stream Processing, and Data Integration by Jay Kreps: 📖 This is essential reading if you want to understand the foundational ideas behind Kafka 🐘 and the “log-centric” view of data. 📝 It’s based on his popular blog posts and dives into how logs work in distributed systems and their practical applications in data integration ⚙️, real-time stream processing ⚡️, and data system design 🏗️. ⏱️ It’s a short, impactful read.

🧑‍🤝‍🧑 Books by his Confluent Co-founders and other industry experts:

🚀 Kafka: The Definitive Guide: Real-Time Data and Stream Processing at Scale by Neha Narkhede, Gwen Shapira, and Todd Palino: 🤝 Co-authored by another Confluent co-founder, this book provides a comprehensive and practical guide to Apache Kafka. ⚙️ It covers how Kafka works, its design 🎨, and best practices ✅ for deploying it in production. 💯 If you’re using or planning to use Kafka, this is a must-have.
💾⬆️🛡️ Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann: 💡 While not directly focused on Kafka, this book is widely regarded as a foundational text for anyone working with distributed systems and data. 📚 Kleppmann covers a broad range of topics, including data models, storage 🗄️, indexing, distributed transactions, consistency ⚖️, and fault tolerance. 🤓 It provides an excellent theoretical and practical understanding of the challenges and solutions in building data systems.
🌊 Making Sense of Stream Processing by Martin Kleppmann: 🆓 This free ebook (often compiled from his articles and talks 🗣️) delves into how stream processing can make data processing systems more flexible and less complex. 🧩 It complements the concepts in “I Heart Logs” by offering another perspective on the power of streams.
🚦 Streaming Systems: The What, Where, When, and How of Large-Scale Data Processing by Tyler Akidau, Slava Chernyak, and Reuven Lax: 🏛️ This book focuses on the principles and patterns of streaming data processing, drawing from Google’s experience with Apache Beam. 🧐 It’s a more abstract and theoretical look at how to design and build robust streaming systems.

🌐 Broader Distributed Systems and Real-time Data Books:

🤝 Understanding Distributed Systems: What every developer should know about large distributed applications by Roberto Vitillo: 🌉 This book aims to bridge the gap between abstract theory and the practical challenges of building large-scale distributed applications.
📊 Building Real-Time Analytics Systems by Mark Needham: ⚙️ This practical book explores how to analyze data streams in real time using technologies like Kafka, Apache Pinot, and other event-processing systems. 👨‍🏫 It offers hands-on tutorials for building real-time analytics applications.
🕰️ Real-Time Data Processing: Essential Concepts, Tools, and Techniques for Effective Data Processing in Dynamic Environments by James Henry and Rayan Mitchell: 📚 This book covers fundamental concepts of real-time data processing, including data streams, latency ⏳, and throughput, and discusses tools like Apache Kafka, Apache Flink, and Google Cloud Dataflow.

bagrounds.org

Table of Contents

👨‍💻☁️🐘 Jay Kreps

📚 Book Recommendations

Graph View

Backlinks