2025Small Data SF

The Great Data Engineering Reset: From Pipelines to Agents and Beyond

For years, data engineering was a story of predictable pipelines: move data from point A to point B. But AI just hit the reset button on our entire field. Now, we're all staring into the void, wondering what's next. While the fundamentals haven't changed, data remains challenging in the traditional areas of data governance, data management, and data modeling, which still present challenges. Everything else is up for grabs. This talk will cut through the noise and explore the future of data engineering in an AI-driven world. We'll examine how team structures will evolve, why agentic workflows and real-time systems are becoming non-negotiable, and how our focus must shift from building dashboards and analytics to architecting for automated action. The reset button has been pushed. It's time for us to invent the future of our industry.
Speaker
Joe Reis
Joe Reis

Author of O'Reilly Fundamentals of Data Engineering

DeepLearning.AI

A "recovering data scientist" and a business-minded data nerd, Joe has worked in the data industry for 20 years, with responsibilities ranging from statistical modeling, forecasting, machine learning, data engineering, data architecture, and almost everything else in between.Joe is also the co-author of the best-selling book, Fundamentals of Data Engineering (O'Reilly 2022).

0:00[music]

0:05[music] We're going to hit the button uh so to speak on the decks. Um I'm talk about the the reset button uh that's hit a lot of industries right now. There's a quote from science fiction uh writer uh William Gibson where he talks about how uh the future is already here. It's just not evenly uh distributed. Who's heard

0:26that line before? Who feels that that line is appropriate for today? A lot of people. Um, so quick story, right? A couple months ago, where it hit me actually was and Whimo is very popular here. Um, but I was in a Whimo and I was on my phone for a uh 8 minute uh drive

0:43chatting with Claude and making an app, right? That's kind of fun. Deployed to GitHub. Um, but if you rewind to like the modern data sack era of like 5 years ago, that would have been like science fiction. being in a driverless car, talking to AI on your phone, and making an app, and deploying it to GitHub all

1:01in like eight minutes. Does that seem normal these days, though? Right. It's like, but for the rest of the world, that's not normal at all. Actually, that's that's pretty uh still a bit farfetched. You mentioned it and they're like, "That's Yeah, you're full of There's no way." Um, so as I travel the globe, I meet with countless uh business

1:15leaders and uh practitioners and and I see a common theme is that we're all staring off into this uh this void, right? We're not quite sure what's on the other side of it. I think we're all trying to figure this out. And so again, the reset button has been hit on a lot of industries. Um, so my talk's going to

1:35map out sort of what happened before the reset, sort of where we are and maybe um some uh you know potential futures of where we're going and also where I think humans fit in. I think all too often we're focused on the technology angle and uh well we're all people I think. Um so but the world before the reset let's

1:52talk about that. So we have kind of a tale of two cities to uh appropriate Charles Dickens uh what I call enterprise land and product land. Who's heard these terms before? I blogged about it. of people who read my blog like all three of you. Um, so read my blog. It's pretty good. Read Ben Sans's blog too. It's good. Um, so anyway,

2:11enterprise land, right? This is sort of the notion of it as a cost center. Uh, who works in that land where it's kind of more internally facing. So a few people, right? So yeah, again, we're in San Francisco. I feel like this is a bit of a bubble. The rest of the world, the 99% of the companies, data teams that I

2:27meet, uh, data leaders that I talk to, they're all in the uh, enterprise land, right? So this is where you're playing defense. uh not offense. Um typically it's about u you know controlling the data being compliant. Uh data is used for reporting right actionable insights and so forth. It's very much a dashboard driven world. Uh the common trope in

2:51this in enterprise land is we need to drive more value for the business with data, right? Um I suppose you should drive value uh for the business at all times but particularly in enterprise land what's interesting is people are having to try and justify why they do data but this has been the constant struggle for 30 plus years right who's

3:11seen this struggle in tension yeah quite a few people and again this is um also means like leaders have high turnover chief data officer I think is like 18 months average tenure which is really bad like how can you any progress, right? Uh and especially after Zer uh zero interest rates, you know, you started getting a lot more scrutiny on costs. So modern

3:36data stack became sort of the um uh modern uh um I guess really expensive, right? Uh through a lot of reasons. Uh so free money made inefficiency invisible until it didn't and high interest rates made it painful again.

3:50But along the way came AI, which we'll talk about in just a second. And then there's product land, right? This is data as a revenue driver. This is more consumerf facing. So here you're playing offense and not defense. And so uh this this leads itself into product thinking.

4:04Who's heard of data products? Yes, everybody, right? So and you start seeing things like shift left, the shift left movement and I guess maybe the shift right movement as well as you can refer to but shift left is more data contracts uh adopting more software engineering practices in data. Uh who does that here? Quite a few people.

4:23Yeah. Right. So again, the data teams here, there's more product thinking. Uh data is the what the company sells, right? So this is a core part of the company's competency. Uh but this is still early days for most data teams around the world. They're still in enterprise land. [snorts] Uh they're not in product land. So and some companies

4:44straddle both worlds, right? Finance may be stuck in enterprise land. Uh product teams might be in product land. And so that's kind of where we are. And along the way we have old uh stack assumptions, right? We have batch ELT versus ETL, Kinman uh sorry, Inman versus Kimble uh versus one big table, right? These are the uh the old battles

5:03of yesterday that we still seem to argue about. Uh it's a very analytics first approach.

5:08Again, dashboards and it's a very deterministic stack, right? Uh data comes in, something comes out.

5:17So, and again, this is 99% of companies, but here we're optimizing within a system, and we haven't been doing that for decades, but we haven't been reinventing it. And so, here's what the reset moment uh looks like. Who remembers when chat GBT came out?

5:31Do you remember what you're doing that day? Um it's sort of like when you ask um you know, old-timers like what what were you uh when the uh first person went on the moon kind of moment. Um I remember this actually. I remember when GBT3 came out and you had playground and I actually forgot about it. It was like it's kind

5:48of a nice toy but you know chatbt blew the lid off um the doors of a lot of stuff and it became the fastest growing consumer app in history. I think it still is. Um and everybody was using it.

6:03Uh and so I remember when I saw it I was like this is actually pretty cool. The the answers suck but this feels like magic right? I have the old uh I'm a lop shirt. I hallucinate more than chat GBT if you've seen that shirt. I think we also had another shirt chat DMT. Uh which was pretty funny. But but you know

6:20I've been doing machine learning for a while. That's actually more my background. I'm not actually a data guy.

6:23I am but I'm not. Um but you know ML was done for a while but for for people like myself was kind of a nerd fetish right um machine learning and AI. Uh even during the data science boom, right? It wasn't consumerized yet. It was something that was still in the back office typically.

6:40But suddenly AI became the default interface for like everything right and so all of a sudden I remember traveling the world um writing to people like Rabbit back there in random spots around the planet going to conferences especially in the Middle East and there was a lot of excitement about AI all of a sudden I met the minister of AI in

6:58Dubai they have that um so uh there's just you know and all of a sudden it's like the singularity will be here any moment and um you know but I think there's a lot of interest in AI and it hasn't stopped in fact I could make an argument in, you know, one of the greatest periods of all time,

7:12potentially one of the greatest bubbles of all time as well. Uh, these two can simultaneously exist. Um, but everyone's using AI for everything, right? So, we had chatbt, then co-pilots, now agents, and now whatever comes next after that.

7:25Uh, but everyone has new superpowers. Um, and AI is becoming the default interface for for everything. And it's flattened the playing field, you know. So I think what I've noticed around the world is everybody is basically back at the same starting line again whether you realize it or not. It doesn't seem to matter where I go on the

7:45planet. I think the same hope and concerns are there. Like the hope is AI is going to make my job and my company more profitable run better like the concern is will I have a job in a few years? I don't know. So uh but the hope is we all do right otherwise we um have other things to worry about and in

8:05vendor land actually you know um vendors are interesting they started attacking on AI onto their products but they weren't AI native right and so what you kept seeing was um you know AI uh vendors slapping AI onto every product um you know basically just big large language model rappers uh you know people doing the same thing with their workflows. I think now we're

8:28starting to figure it out, but we're all still experimenting, right? We're all still trying to figure this out. Who feels they have a good handle on AI at this point. Some people like like real rock solid. Awesome. So, >> yeah, I was actually uh sitting at one of my friends houses. I think he was one of the on the opening uh the original

8:46team for chat GBT and I was talking to him about it and what was interesting is I kind of asked him like where do you think things are going and he's like I don't know. [laughter] So, so I thought that was a good good um uh kind of a testament to where we are now. If somebody at that level is confused and

9:00I'm like I don't know what to think, but right now you're seeing some interesting stuff like Fast MCP has like a million downloads a day. That's crazy when you think about that. That launched last year. Um you know, Cursor and Lovable are the biggest um products in terms of AR um you know, in terms of speed to get

9:17there. Open AAI is now a $500 billion company in three years, which is kind of insane. and they want to IPO at a $1 trillion valuation in 2027. I mean, that's the numbers are just insane right now. Um, I think the speed is jarring for a lot of people. Uh, and like like I said, experts if they can't uh figure

9:33things out, um, you know, I find it's a very interesting time for everybody. Like I feel like we're all beginners again.

9:41I did a podcast with Tyler Akadau. Um, so he's one of the biggest streaming experts in the world. And uh during the podcast I mentioned how is he dealing with AI and he's like this is interesting. I feel like I'm starting over again having to relearn how he works. I said that's fascinating. A guy like you is

9:59starting over with the beginner's mind. But that's just where we are. So again the future's here. It's just unevenly distributed yet.

10:08So I think we're trying to figure out what's on the other side. I think right now it's a mixed bag. And uh you know you see studies like the MIT study that said 95% of generative AI projects fail right and you uh dig under the hood that's actually not that bad. Um then there's another study that came out what

10:23last week that 2.5% of tasks for freelancers were using AI agents. So still very relatively low adoption but if you talk to vendors everyone's using agents and they're replacing entire workflows workforces with their agents.

10:34So I don't know what the hell to believe. Um but you know shadow IT the MIT study was interesting. They talk about shadow IT and shadow AI and how a lot of people are bringing their own AI to work because the uh workplace provided one is just terrible. So people are using it for sure. Um and the

10:50biggest biggest successes I'm seeing with AI is actually from nontechnical people. Uh my climbing gym is actually using it to reorient a lot of their workflows and they're loving it. It's awesome and I'm personally having more fun than ever. But I think right now um what this means is you know uh we're we're in a different world now, right?

11:08Right? It's no longer the debates we've been arguing about like one big table, Kimble, Inman, you know, lakehouse, warehouse, blah blah blah. Um, everything's changed, right? And I think we're all moving faster and faster, but it's the old uh red queen effect from Alice and Wonderland where you're running faster and faster to stay in the same place. So, who feels like that

11:28right now? Yeah, it's it's jarring. So, what doesn't change, right? I I think I'm going to kind of rush through this, but the um the fundamentals are still gravity. When I wrote fundamentals of data engineering, that was meant to be a book that would highlight uh the things that probably aren't going to change, right? Ingesting data, storing it,

11:43transforming it, and serving it. Um data modeling doesn't change, right? That I think is more important than ever. Data quality. The undercurrents don't change.

11:51In no scenario does security go away, right? In no scenario does architecture go away. And in fact, I'm going to argue they become more important. People, process, and technology uh and data are are the the four pillars. No, Conway's law doesn't change. We can talk more about that. But what does change, right?

12:09I think we're moving into a world of architectures where we have, you know, more, you know, a mix of determinism and probabilistic um reasoning. We're moving into a world where it's it's not just tables, it's multimodal data. Um it won't be just be batch pipelines. We'll have continuous feedback loops, real-time data. Um dashboards will be um

12:28I guess maybe somewhat replaced by autonomous actions instead. Um, and I think that the name of the game is playing offense right now instead of defense, right? So, I think that's that's the main thing I'm seeing around the world is this is forcing a rethink of how people are actually going to be doing their work and how they think

12:41about their work. Um, people are rethinking org structures, too. Who's heard of the notion of work charts versus org charts?

12:49So, it's something to think about. Right now, we we live in a world where it's very hierarchically, you know, there's there's org charts. Uh, but in big big tech right now is gutting a lot of middle management. um you know because they want to be more AI efficient and probably cut a lot of costs and all that

13:03other fun stuff. But what you're starting to see this though people are starting to rethink what it means to have an organization. What is the layout when tasks are sort of orthogonal uh you know to to what an org chart provides.

13:14Um you know what I'm also seeing are CEOs and product managers you know vibe coding a lot coming up with products showing this to the engineering team.

13:23I'm sure all of you are seeing this too. Um but what this means is it's fast iteration. It means fast learning. It's no longer just Figma drawings and whiteboarding. Still important, but it's now it's about show don't tell, right?

13:33Give me something I can play with. Um, let's try it out. Let's iterate. So, what you're finding is is the learning cycles are are improving a lot. You're you're able to get to conclusions a lot faster, make mistakes a lot quicker, hopefully not catastrophic mistakes. Um, but uh no matter what, things have changed, right? Um, and I think we need

13:52to start looking at what the next generation of architectures is going to look like. So here's the data engineering part. What does this look like? Well, I think that the term postmodern data stack has been thrown around adnauseium, but I did an article on this also about postmodern uh data stack, but it was from the perspective that the modernism who the modernist uh

14:12movement back in the day, right? It was all about rationality and determinism. And now you have postmodernism that followed it, which was all about silliness and absurdity and hallucinations, which AI does have a lot. And so I think if anything the postmodern data sec notion is is a very appropriate term for where we're going.

14:28But what what are the core traits of this right? Um you know everything is going to be connected contextual and continuously evolving.

14:37Uh this isn't the world of um you know moving data from point A to B making reports making a model you know having it do its thing. It's it's going to be very much a feedback loop. Um so some of

14:48the key traits right hybrid uh deterministic uh and probabilistic systems you can sort of already see this happening uh you know when when we're talking about um even old school machine learning right uh recommener systems are somewhat probabilistic but it's going to get way uh way tighter with large language models in the mix of everything um who's experimenting with with this

15:09right now are building large language models into all their workflows or agents if you yeah so I think this this is an interesting thing where I'm seeing um uh a lot of activity right now. There was a notion of there's this really interesting paper called AI overlords and new agentic architectures from Berkeley a couple months ago. But you

15:26know, one example was an agentic query planner, right? Where you're going to have uh your database is going to get throttled by probably 20 to 50 times more requests than it used to. Uh most of these queries are going to be semantically similar. So how do you deal with that? How do you not bring your database to its knees? you'll probably

15:43need something to sit in front of it to handle uh this this immense workload that we just haven't had before. So you start thinking about that, start thinking about things like semantic uh models that are made for agents, not just for humans that that update in real time are discoverable by agents. I mean these are the sorts of things you need

15:58to start thinking about, right? Um streaming will be everywhere, right? Kind hopefully it's growing but uh um but you you're going to need continuous compute precisely because agents aren't just going to wait for batch. they're not going to sit there like I'll come back in 24 hours when your you know job runs. Uh and so this is what I refer to

16:17in my book actually as a live data stack and now it's happening. You're having the feedback loop with real time.

16:21Obviously we didn't foresee any of this happening but uh you know you have you're talking about event driven feedback loops because you continuously grow and evolve. Uh AI native governance and observability right so regulations still apply here uh especially if you're in a place like Europe. Um the US isn't I don't know about regulations here. uh but or the rule of law or whatever but

16:42um but you have you know what's interesting is you know what I'm seeing is observability companies now are talking about agents that can self-heal systems right so you don't have to just get the alert wait for things to uh fix that you know the systems will start healing themselves um so again you have to balance this against the fact that

16:59regulators are still going to want audit trails because that's how they work especially in Europe uh we're moving more towards knowledge orientation not just data so what's interesting in this respect is things like who's heard the words ontologies and taxonomies and knowledge graphs, right? These weren't words that data people talked about two years ago maybe some did, but this is so

17:17the library sciences and knowledge worlds are colliding with data and this is absolutely necessary. So the vision of the semantic web that Tim Berners Lee had in 2001 of you know RDF and agents, right? Agents talking to each other, this is finally becoming a reality for the first time. I I would urge you to go back and read that paper. It's still um

17:37pretty relevant actually. But that's where we're going from. We're going from like data lakeouses to knowledge lakehouses or something. So uh multimodal data goes without saying, but data is no longer tables and um you know tables and semi-structured data. We're talking about images, text, audio, and video, right? So think about how you incorporate that uh into a workflow

17:54that's continuous, you know, evolving and so forth. Um, and so you know what I guess to kind of close out here, what we used to talk about was, you know, let's get actionable insights uh from our data. But now I think we're talking about let's just get actions uh from our systems and have those uh you know be

18:15sort of the primary uh force that we're dealing with. Um so you know but it's still early days right? um you know but you can pretty soon you're going to have agents that um talk to your data and do things and that's just going to be how it is. I think the role of a data engineer is going to be changing um you

18:32know we're going to uh evolve more into system designers uh product you know and I and I so I think some of the skill sets that we're going to be dealing with um kind of closing out is uh um is this tension that I see right now where there's a lot of people who say well we'll just vibe

18:48code everything and we don't need to learn anything anymore right who's heard this like what's what's even the point of like learning anything why would I write want to read your your stupid book um or take your course cuz like We don't need to know anything. AI is going to do all the work for us. I think now more

19:01than ever though, the human opportunity here is actually something that I want to leave you with is it's, you know, expertise and craftsmanship. You know, I don't know what this means going forward. I don't think anybody in other disciplines know what this means going forward. Um, but I think the human element is going to matter more than

19:14ever. I think craftsmanship and your expertise and your skills are going to be the differentiator between you and things that can just create software, right? I think all of us probably have a sense of value about ourselves and our worth in the world. So, I'd urge you, you know, to master the fundamentals.

19:29Uh, you know, learn systems thinking. I think that's going to be more important than ever. Adopt a product mindset, right? Understand what users want at the end of the day. Um, because you're going to be orchestrating. You're going to be you're going to be uh middle managers of your own. I guess not in a Michael Scott sort of way. Um, maybe you will actually

19:45be kind of funny with agents, just be really cringey with them. Um, but you know, so what what this means though is um you know, you have the opportunity though to to invent I would say new workflows. I would say this is the time when you you rarely have this opportunity in your careers to invent new things to rethink the way

20:04we've been doing things and make it the the barrier to entry is pretty much zero right now. Well, I wouldn't say zero, but it's easier to make things than it used to be. Um, so I'd urge you all, you know, think about what's next. You know, don't just sit there staring off into the void wondering what's going to

20:19happen. I think if there's a takeaway, build the future. Build what's coming next. uh there's nothing to really stop you. You can raise money. Um you know, you can do a lot of things and uh I would say, you know, it's it's a really exciting time. Uh but again, everyone's back at the starting line, right?

20:34Staring into the void, wondering what's on the other side. Build it, right? Um

20:39so I would say uh you know, but be human, you know, in in this age, too.

20:44That's what's going to count more than ever. So thank you very much. Byebye. [music]

More 2025 Talks
View all