Tech on the Rocks

Name: Tech on the Rocks
SKU: PD_8002_028525IN

Failed to add items

Sorry, we are unable to add the item because your shopping basket is already at capacity.

Add to cart failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Tech on the Rocks

Written by: Kostas Nitay

Listen for free

Episodes View all

Building the Open Lakehouse for the AI Era with Shubham Baldava from DataZip / OLake

May 21 2026

In this episode of Tech on the Rocks, Nitay and Kostas sit down with Shubham Baldava, co-founder of DataZip and creator of OLake, to trace the evolution of the modern open lakehouse — from the early days of Apache Hudi to today's Iceberg-centric world.
Shubham shares stories from a decade of data engineering at scale, including building near real-time pipelines at Japanese fintech giant PayPay, scaling a TikTok-style social platform at ShareChat from 10M to 160M monthly active users, and the cost and complexity pressures that pushed teams to adopt lakehouse architectures in the first place.
From there, the conversation digs into the table format wars: why Hudi was the early pick for truly open, vendor-neutral lakehouses, how Iceberg has caught up and pulled ahead on integrations, where Delta fits in, and what the Tabular acquisition means for the community. Shubham explains why he believes all the major formats are converging — single-file commits, deletion vectors, variant and geospatial types, Z-indexes — and why integration breadth, not features alone, is now the deciding factor.
The discussion then turns practical: what the four real pillars of a lakehouse are (ingestion, optimization, query, governance), why Debezium is so hard to replace, what it takes to hit 10-minute CDC latency for fintech reconciliation, and how OLake is rethinking ingestion with Arrow-based writes, exactly-once semantics built on Iceberg metadata, multi-phase compaction, and watermark-based parallel backfills.
Finally, Shubham looks ahead to a future where Iceberg becomes the single substrate for structured, semi-structured, and unstructured data — powering multi-engine analytics and AI workloads on top of formats like Lance and Vortex, now that Iceberg has decoupled from Parquet.
Topics covered:
• Lessons from PayPay, ShareChat, and indie app entrepreneurship
• Hudi vs Iceberg vs Delta — history, trade-offs, and convergence
• Why fintech reconciliation needs sub-10-minute CDC
• The real cost of running BigQuery, Trino, and Spark side by side
• Debezium's staying power and why Go (not Rust) for next-gen CDC
• How OLake uses Arrow, equality and positional deletes, and multi-step compaction
• The decoupling of Iceberg from Parquet and what Lance/Vortex unlock for AI
• Where to build in-house vs adopt managed lakehouse tooling

Show More Show Less

58 mins

Failed to add items

Sorry, we are unable to add the item because your shopping basket is already at capacity.

Add to cart failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free
From Session Replays to Autonomous Improvement: Shipping the First AI Product Engineer with Milana

Apr 24 2026
In this episode, we sit down with Rohan Katyal and Raghav Sethi, co-founders of Milana, to discuss the shift from passive analytics to the world’s first AI Product Engineer. Rather than just providing another dashboard to monitor, Rohan and Raghav are building an agentic partner that you add to your product to bridge the gap between discovery and deployment. Drawing on their experience at Meta, Yelp, and Airtable, they explore how Milana enables autonomous improvement - turning deep user intelligence into shippable code and structural refinements that act as a tireless extension of your engineering team.
The conversation dives into why session replays — a mature but historically underused technology — are now a powerful data asset thanks to vision LLMs. Raghav explains how session replays are really just high-granularity logging of DOM changes, not screen recordings, and why feeding them through AI unlocks insights that traditional event-based analytics simply can’t capture. The team breaks down how they use just-in-time structuring to extract meaning from dense, unstructured session data without requiring upfront instrumentation.
Rohan shares hard-won lessons from building Yelp’s experimentation platform — including how teams that simply ran more experiments consistently outperformed those with better data resources. They discuss the tension between A/B testing rigor and iteration speed, why most experiments never ship, and how lowering the cost of generating and testing hypotheses changes everything about product development velocity.
We also get into the technical details of semantic clustering across millions of sessions, why video is actually a more compact representation than raw DOM for LLM reasoning, and how Milana analyzes sessions from multiple perspectives — user researcher, PM, founder — to surface real pain points. Plus, a bold prediction: analytics dashboards are dying, and the future belongs to agentic systems that don’t just deliver insights but actually own and drive your OKRs.
Topics covered:
Why session replays are the ultimate untapped data asset for product teams
How vision LLMs unlocked AI-powered analysis of user sessions
Just-in-time data structuring: querying unstructured sessions without upfront instrumentation
Lessons from building experimentation platforms at Yelp and Airtable
Why running more experiments beats having better data
Semantic clustering: separating signal from noise across millions of sessions
Video vs. DOM vs. events — the best data representation for LLM reasoning
Analyzing agent behavior through session replays
The death of dashboards and the rise of agentic growth systems
User research horror stories and the surprising things users do

Chapters
00:00 Introduction to Rohan and Raghav's Journey
04:47 The Importance of User Research
08:03 Making Solutioning a Science
11:09 Understanding Session Replays and Experimentation
14:50 Defining Sessions and Experimentation Platforms
18:54 The Need for Consistent Metrics
22:11 The Role of Events vs. Session Replays
29:46 Leveraging LLMs for Enhanced Insights
35:04 Determinism vs. Non-Determinism in Data Analysis
37:57 Understanding User vs. Agent Behavior
39:47 The Art of Structuring Data
45:25 Semantic Clustering and Its Importance
47:09 Building Infrastructure for Complex Data
51:24 The Future of User Simulation and Experimentation
Show More Show Less
1 hr

Failed to add items

Sorry, we are unable to add the item because your shopping basket is already at capacity.

Add to cart failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free
From Exabyte Storage to Reactive Backends: Jamie Turner on Building Convex After Dropbox

Apr 9 2026

Jamie, a seasoned startup founder and former Dropbox engineer, shares insights on building distributed systems, scaling storage solutions, and the impact of AI on infrastructure and application development. Discover practical lessons from scaling Dropbox, the evolution of data storage, and how Convex is shaping the future of app development.

Show More Show Less

59 mins

Failed to add items

Sorry, we are unable to add the item because your shopping basket is already at capacity.

Add to cart failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free

No reviews yet

Tech on the Rocks

Failed to add items

Add to cart failed.

Add to wishlist failed.

Remove from wishlist failed.

Follow podcast failed

Unfollow podcast failed

Tech on the Rocks

Building the Open Lakehouse for the AI Era with Shubham Baldava from DataZip / OLake

Failed to add items

Add to cart failed.

Add to wishlist failed.

Remove from wishlist failed.

Follow podcast failed

Unfollow podcast failed

From Session Replays to Autonomous Improvement: Shipping the First AI Product Engineer with Milana

Failed to add items

Add to cart failed.

Add to wishlist failed.

Remove from wishlist failed.

Follow podcast failed

Unfollow podcast failed

From Exabyte Storage to Reactive Backends: Jamie Turner on Building Convex After Dropbox

Failed to add items

Add to cart failed.

Add to wishlist failed.

Remove from wishlist failed.

Follow podcast failed

Unfollow podcast failed