I have finally been able to create an app out of an idea that I have been thinking about for some time now. This cuts into the AI-Edtech space. The main motivation for this is my 6 year old who has...
Markov Decision Processes: The Foundation of Reinforcement Learning
In the last post, bandits had no state, we chose an action and received a reward. The next chapter deals with something more general called the Markov Decision Process (MDP) where we can choose dif...
Reinforcement Learning: A First Pass Through Sutton & Barto
I came across the book Reinforcement Learning by R S Sutton at the library today. With AI agents becoming more common, I felt it will be important to understand how we can guide and reinforce their...
Streaming in Databricks: When Expectations Meet Reality
Databricks has done something genuinely impressive. It brings batch, streaming, ML, and now even agents into a single platform. For many teams, that consolidation is the reason Databricks exists at...
AI, Data Teams, and the New Silos: A Lesson From Cost Optimization
Over the last decade, the data ecosystem has evolved faster than anyone expected. Databricks, AWS, and the modern lakehouse stack have made it easier than ever for teams to move independently. With...
Unified Streaming Pipeline: Intelligent Multi-Source Deduplication with APPLY CHANGES
One of the most challenging problems I’ve solved recently was figuring out how to collate events arriving from multiple source systems, each with its own delivery pattern, format, and reliability m...
Performance Issues with Parallel Branching in Delta Live Tables
I’ve worked with Databricks for more than a decade now, and it amazes me how it evolved exponentially from a simple Spark workspace into a full platform for data engineering, ML, analytics, and now...
Effects
Effects Any program that does anything useful has some side-effect. Well, the whole point of a program is to have some side-effect. Examples of side-effects are reading from console, writing to db,...
Stateful processing with Kafka
Kafka is a distributed streaming platform storing data in topics and providing scalablity with topic partitions. Even though Kafka is extensively used for connecting our microservices through publi...
Implementing a Health Check Mechanism for Kafka
Kafka is a well known distributed streaming platform that stores data in topics and provides scalability with topic partitions. At Veon, we use Kafka extensively to publish and consume events from...