Panel: Is the Future Small? | Small Data SF 2025

2025Small Data SFpanel

Panel: Is the Future Small?

Small data isn't just the future — it might also be the past. The median database has fewer rows than most people think, and the median query would run fine on a phone. So why did we spend a decade building for scale we didn't need? This panel will explore whether "small" is the future of data and AI — small models, small tools, small teams — or whether efficiency gains will just be consumed by ever-expanding ambitions. We'll discuss small language models for edge deployment, why AI's biggest impact on the data stack may be code generation, and what happens when vibe-coding a throwaway app becomes easier than signing up for another SaaS tool.

Speakers

Joe Reis

Joe Reis

Author of O'Reilly Fundamentals of Data Engineering

DeepLearning.AI

A "recovering data scientist" and a business-minded data nerd, Joe has worked in the data industry for 20 years, with responsibilities ranging from statistical modeling, forecasting, machine learning, data engineering, data architecture, and almost everything else in between.Joe is also the co-author of the best-selling book, Fundamentals of Data Engineering (O'Reilly 2022).

Benn Stancil

Benn Stancil

Former Co-founder and Chief Analytics Officer at Mode

Benn Stancil was a co-founder of Mode, a business intelligence tool that was acquired by ThoughtSpot in 2023.While at Mode, Benn held roles leading Mode's data, product, marketing, and executive teams. He regularly writes about data and technology at benn.substack.com. Prior to founding Mode, Benn worked on analytics teams at Microsoft and Yammer.

Shelby Heinecke, PhD

Shelby Heinecke, PhD

Senior AI Research Manager, Salesforce

Dr. Shelby Heinecke leads an AI research team at Salesforce, focusing on cutting-edge AI for product and research in emerging directions including autonomous agents, LLMs, and on-device AI.Some of her team's most notable agentic AI works include the open-source multi-agent platform, AgentLite, and the "Tiny Giant", a competitive 1B model for function-calling.Shelby earned her Ph.D. in Mathematics from the University of Illinois at Chicago, specializing in machine learning theory. She also holds an M.S. in Mathematics from Northwestern and a B.S. in Mathematics from MIT.

George Fraser

George Fraser

CEO, Fivetran

George Fraser is the co-founder and CEO of Fivetran. Prior to Fivetran, he worked as a neuroscientist before transitioning into the tech industry and leveraging his analytical background to solve complex data challenges.

0:00[music]

0:09[music] Is small the future? Cuz this is a small data comp. I suppose the first question for everybody and maybe I'll I'll start with um Shelby since you just gave a talk on this. Sure. Is is what do you think about um the term small as the future of where we're going? You just gave a talk on small models, but what do

0:30you have any thoughts on >> Yeah, so an emerging direction as most of you just heard me talk about is small small language models. I think um there's going to be you're going to see a lot more of that. When we talk about small in terms of model size, we're talking about number of parameters in the model like the number of weights in

0:47a deep neural network basically. So large language models are called large language models for a reason. They're they have a large number of parameters.

0:53When we say small language models, it's models that have a fraction of the number of parameters. They consume less resource. And I think like I think we're at a stage now where, you know, we've had great time building demos, deploying things with large language models. We've gotten very far, but now we're we're entering a phase where efficiency is

1:11going to be key. And that's where small that's where small models are really going to play a big role.

1:15>> Awesome. Uh Ben, what do you what do you think? >> No. Uh, I mean like [laughter] I think my my view of this is it's it's similar to like what happened to browsers where browsers got way faster and so we just stuffed them with more stuff. And so like the more that we can do with yes there will be efficiency

1:34gains and what we can do with models and stuff but like that'll just mean we'll be like well let's let's use it like and so sure there will be things like efficiency that we want to get out of it. Yeah.

1:42>> But there will always be people who want to be on the edge of it for the sake of being on the edge of it. And so I I don't think that if you look at like the way people have used even just like the the foundational models, yeah, >> great GPT3 when it came out, we were all

1:58amazed by it and it's way cheaper than it was and what now that we have like GPT5, >> but people are still wanting to use GP5 the same amount as we used GPT3 2 years ago.

2:06>> Oh yeah. >> And so like I don't I think basically whatever space we get from things becoming more efficient will just fill.

2:11And so yeah, things will become smaller in a sense, but really like we will use the the amount of compute and resources and data that we have and there will be more of that. So we will use it >> to to your point I want to add to that like at least my thought I don't think the future is only small. I think I

2:27totally agree with you. GPT5 is amazing. It's not going anywhere. It's only going to get better. There's always going to be a place for that. I I do think we're going to see small models in new places on phones, in cars, on robots. I think there is a place still for small models.

2:42So I think the future is small and big models working together, models everywhere. >> What do you think, George? if you come from more of the uh data uh world. But >> well, I think it depends on whether we're talking about like running queries over relational data or whether we're talking about running deep neural networks which are

3:03just like completely different things um and will evolve in different directions.

3:11>> Yeah. So I I think the the thing that I know about is um you know like relational data in a in a business context and there um what has happened is that the size of the data that people actually have simply has not grown in proportion to the capabilities of

3:35computers. >> Interesting. >> And there's like a conspiracy of silence about this. People don't talk about uh pe people exaggerate and and everyone writes blog posts about their giant data sets. Nobody writes a blog post about like I loaded my data into BigQuery and it was so small it was really easy and like my queries all run really fast. Uh

3:57and so you just tend not to hear about it but we see it at Fiverr and I wrote um I wrote about this last year. There are public data sets that have been released by the data warehouse vendors, Snowflake and Redshift specifically, that have samples of the queries that people actually run and statistics about like how much data they read and they

4:17are shockingly small. Like the median query would run quickly on your iPhone. Uh, >> so it's weird. It's really weird the disconnect between the data sizes people actually have in typical businesses that they query regularly and what gets talked about. Um, so I think not only is the the future, it's actually also the past.

4:39>> It's just a matter of noticing. >> That's really fascinating. Who's seen that actually? Um, >> yeah, a few people. I I know I've seen this too where companies will um I think there's a bit of tool fetish that goes on where you want to get the uh the big data warehouse uh and cram your um I

4:57don't know 10 gigs of data into it or something. But I think that's again when I talked about the 99% of companies that's what I'm talking about. The data sizes simply aren't that big and the queries of said data set are simply a subset of that. you're not typically and Jordan's uh I think blog post way back

5:14in the day pointed that out with the um uh about big data not being that big in terms of what he was seeing an >> yeah and you have to remember when you query a table you don't read the whole table you just read the columns that are referenced >> right >> in the query uh so that data is much

5:29smaller than the size of the entire table >> yeah um so George you're associated with the uh the modern data the stack. Um,

5:42and you uh just merged with I think the two uh the two darlings of the industry, Fiverr and DBT merge. Um, what does a modern data stack mean these days?

5:52>> Well, the modern data stack has always been about reducing complexity, which is the real problem that people face when they work with data in a business context is incidental complexity. And so, it's about making things easy and automated and having great defaults. Um, and you know, it refers to a collection of tools that had a similar philosophy

6:13and worked well together. Not necessarily because they were designed to work together, but because like I said, they just have a similar philosophy. And I think I I think there's a sense in which it's one that everyone who builds new whether they use tools from us or other vendors associated with that term or not, everyone who's building new systems

6:31builds them that way. Uh and of course most of the world um you know is still doing like older patterns. So I it's it's an adoption phase at this point.

6:42>> Interesting. What do you think Beng I I well so so to me the modern data stack as a term is like it's just an era. It's just like a marker for some tech era. The the sort of joke I've made about it is it's data tools that launched on product hunt.

6:57>> Um >> launch on product hunt. >> Five trade. Actually, you did. There is a five train. I remember looking this up once. The five trend is on Product Hunt.

7:05Um, [laughter] I don't know if you did it. Somebody posted it. Someone hunted it, George.

7:10Um, [laughter] it was a community. Uh, but basically it was like things that were kind of bottoms up. They tended to be like PLG oriented. They tended not to go sell to Oracle stuff. Like Oracle Cloud Databases did not launch on Product Hunt. Um, and there's also like Product Hunt was cool and now it's less. Um, and so that to me was basically a marker

7:31of that era. I think like that era is gone. We don't do that as much anymore.

7:35All the cool kids now do AI stuff and whatever. Uh, what happens to the tooling? I agree with George on this is like there was a bunch of legacy stuff.

7:44The DBT conference at the same time as the DBT conference, there was an Oracle conference on the other side of the strip. A lot of those people are like not using the modern data stack. So there's a whole world of people who can still adopt this. But like the upandcomers like a lot of the people in San Francisco and like the cool kids

7:59that are you know hopped up on Red Bulls for 996 um [laughter] are less thinking about this like data doesn't have the same sway with that group that's more of like you just build it you ship it in two weeks and then you make infinite money and you keep going.

8:13Uh and so I think like there is a shift here where the modern data stack is moving to becoming kind of the mature thing and there will be some new thing that is what all of these AI companies do that are 10 people and don't have the resources to build a data team or like don't have the desire to do it stuff.

8:26>> Interesting. Shelby, what do you think comes next uh in terms of stacks? >> Yeah, I think there's a number of things. Again, speaking more from like the AI the AI modeling side and where we're going with that. Even in terms of data, I think one of the big directions that we're innovating in is generating synthetic data. That's one direction

8:44that we're looking at. And um [clears throat] generating synthetic data, even simulation environments again to gen to generate that data. Um I think a lot of the data that we need for say deploying LLMs or agents um for for these use cases, we don't it's not easy to capture that type of data. Um, and so we're coming up with pipelines where we can

9:04come up with, you know, tasks that represent what how users would use the agents and the API calls or the actions they would take to to the agent would take to complete that task and and train models. I think that's a really that's a very promising direction when it comes to um comes to the training data we need

9:21to get agents and LLMs to the next phase. >> Yeah. I guess I do have a question um it's actually not in our notes here.

9:29Yeah. Um but uh it was a discussion I had with Matt Turk the other day um about >> uh you know what's the capabilities of of a foundation model just doing uh just kind of eroding at uh what you what we'd probably make an app for back in the day. Now it was just the foundation model can just write the code and just

9:47do it like what do you think the vendors of tomorrow look like? Um,

9:54>> I I have thoughts about that. I think a lot of things are getting chipped away at. I think about, you know, are we going to be one of them? Like try to figure it out and solve for it. Uh, if so, I don't I don't think so. And and we try internally to see if we can defeat

10:09ourselves with AI, but there's a lot of things that are sort of niche that uh it's just not worth it to go learn to use some piece of software. If you have any coding ability and you have uh coding agents when I made the slides for my talk today uh I I made them yesterday

10:28[laughter] and I generated these images with chat GBT but try as I might I could not get it to get the aspect ratio right and I didn't have like Photoshop or anything on my computer so I just went and wrote a script to and I didn't just need to change their shape I needed to do some other things to them that used open I

10:47had codeex make a script for me that used OpenCV to reformat my slides for this talk because it took less time to do that than to go like find something to do that thing for me. And this happens more and more that like I don't want to go login. I get to the login screen. I'm like I don't want to sign up

11:03for this thing. You know what? I'm pretty sure I can just vibe code this throwaway app for myself to solve some tiny problem.

11:10>> Wow. So, how long did it take you to from >> to make all the slides?

11:15>> Yeah. Uh well it took me like about an hour and a half to make all the slides but the the coding piece was interleved in the middle of it where I wanted to change things and I had to move things around and other stuff. Sometimes I wanted to expand using mirroring and sometimes I wanted like you know border

11:31replication but then you know >> that's crazy [laughter] >> and I so I think there's a lot of tools that are going to a lot of niche tools are going to go away because people will just be able to >> Yeah. It's something I've been doing too where any more it's interesting when you look at products right like vendors will

11:50try and sell products um and then uh but most of the time I mean who uses like a 100% of the functionality of the products that you buy like no right so I've increasingly been doing the same thing like if I need just a certain functionality I'll just say cloud code I want that functionality make it and then it's there and then we

12:11move on with our day but it's it's a Very interesting question. I don't know if you have any thoughts um either of you two on just sort of the the future of what apps might look like, but >> Oh, I I just wanted to add to actually the data piece of what you were saying.

12:22You were mentioning like there's these niche areas where you know I guess tools or or large language models aren't able to do well. I was I think one of the what we're seeing now across the industry is the emergence of these data annotation companies. That's that is going to be that is huge. Um because I think to get LMS to be able to do better

12:41at the type of niche task that you mentioned or niche tasks in general, it really comes down to getting people. I was talking about synthetic data earlier. That's one way to go about go around it. But getting people, getting human experts to basically, you know,

12:55inject their knowledge into the training data so that the models can eventually fill some of these gaps that we're seeing.

13:04>> Interesting. Do you have any thoughts? I my like rough view of this [clears throat] is so there's this criticism of Salesforce that periodically pops up like on the hacker news crowd like >> no one criticized >> and I have to admit that I have seen >> I have I have occasionally seen things on hacker news which is upsetting. Um

13:21[laughter] >> that's like isn't this just like a spreadsheet with a UI on top? That's like every CRM is kind of like isn't this just like a bunch of spreadsheets?

13:31Um, it's sort of the Dropbox, the original Dropbox thing of like I could just make this on a weekend and like it's like hard. Um, there's a lot of like software apps that are basically spreadsheets with UIs on top. And that I

13:45think is like an oversimplification for obvious bunch of reasons, but one of the reasons is because one of the things that things like Salesforce do is they inject a lot of like opinion into how to use that spreadsheet. Like if you are using Salesforce for the most basic stuff of like how do I track my deals and things like that, it gives you a

13:58bunch of like information about how to actually do that. It tells you, okay, you should have accounts, you should have opportunities, you should have like these things should be associated in these sorts of ways. Sure, you can kind of keep track of that in a spreadsheet if you want, but it's a huge pain. But instead of using Salesforce, like the

14:13the ideal thing that you probably actually want is this like omnisient sales ops person who manages that spreadsheet. Like if you were going to use a tool like Jira, you could use Jira or really what you want is like the world's best project manager that you just tell stuff to and they go and update the spreadsheet and

14:29they tell you what's on it and like you interact with it that way. And so I think there is somewhat of an abstraction there that you can think about how these things will work is that's actually what a lot of AI oriented software is is it's just an abstraction on top of a spreadsheet. Um, and so I think like that's what what

14:44ultimately sort of starts to replace these things is like why would I bother to build all of these things that the software itself is injecting a lot of expertise in the way the thing is built when what I really just want is like the expert that manages that spreadsheet for me. an app is is a a person with a

14:58spreadsheet and and so I think like that is a way that some of this stuff goes.

15:02Not necessarily like data tools that are doing like a lot of data management and like actually doing processing, but a lot of SAS apps are essentially this type of thing that you don't really need as much of the UI for if there's like a person in front of it and effectively that's what these things are.

15:15>> But you need those opinions. I think one of the biggest mistakes people make is they adopt software and it doesn't work the way that they were envisioning and then they customize. And I always tell people at Fiverr, pick a good tool and if the tool doesn't work the way Fiverr works, change Fiverr rather than the tool. It's probably right. Like whoever

15:40designed that recruiting workflow probably thought about it longer than you did and just like adopt the workflow of the tool rather than trying to change it to match the way the business works today.

15:52>> Yeah wisdom in it and I I very much agree with that. I'm not saying like this is just like chat GPT on top of CSVs. It's I think products start to look more like that. So, in the talk I had, I referenced there's like the new dating app, and I think a dating app is a pretty clean example of this, where

16:08>> the old dating apps are a giant database of people you could date and then a UI on top of that to present them to you in a way whether choose you want to or not.

16:17The new dating is apps are a giant database of people you can date and an AI thing that says y'all should meet and just introduces you to people. And so, like, it has some expertise presumably in there, I don't know, about like why you should talk to this person and not that person. That's like that's what you're building, but you're basically

16:32teaching the thing the expertise instead of like trying to impose that expertise through UI that suggests oh like this person because they're suggested or because like we think you're comp whatever the thing is like instead of that you just basically have the agent >> be told that expertise.

16:49>> It's a fine line I think juggling how much should be suggested versus how much should be taken from the user.

16:55>> But that's already happening in in apps today. there. Yeah, >> that's that's it's just expressed through a UI that constrains what you can do.

17:04>> Interesting. Um, let me see here. We got a bunch of questions here. Are we able to take audience questions at some point too if you guys are interested in that or um maybe I guess we'll we'll figure that out. Um, but I guess Ben had a really good uh we were texting last week about this and this is a question that he had

17:20which I want to bring up uh to the panel here, but uh this is a data conference.

17:25Uh, how appropriate would it be to do a panel without talking about AI in this day and age at a data conference?

17:33>> You could do it. >> I could do it, too. Um, [laughter] I'm lazy, George. Um, but I, uh, >> world's still going to need dashboards, man. You still need to know what was, you know, what was bookings last week.

17:47>> Yeah, >> still an important question. Well, it is and it's something I think I brought up in the talk, but it's funny because you have how many company how many people have dashboards at their companies, >> right? Okay. All of you. Um, how many people how many people in your company look at these dashboards?

18:03But now we're going to throw an AI prompt on top of it and that will magically mean that people talk to the data more even though the answer to 90% of the question is already in front of you. Right? So, I think this is one of the things where I I've I'm personally struggling um with where we are today

18:17that a lot of the answers you need to run your business have been there the entire time. You know, it might be on a dashboard, it might be in Bob's Excel report uh that nobody looks at either.

18:26But >> well, I mean that that's kind of true of support and AI is working in support for ticket deflection, right? The answer is it's in the docs in like the section with the name that is literally your question right here. there is the answer and the AIS they just like >> over here and it that works it works so

18:46like maybe that's going to be >> it automates it does add value it automates a lot of that right so >> speeds things up >> people didn't want to go look at the docs they wanted to ask for the support ticket it's like all right I'll I'll read the docs to you >> they want the person with the

19:00spreadsheet

19:04>> it's Bob's spreadsheet the answer is in his spreadsheet >> it is and it's in an email that's sent to you whoever you warning that that you don't open [laughter] so because it's set on a cron job uh or another schedule. No, it is it is an interesting thing and I did think a lot about your question about how appropriate would it

19:19be to do a panel without talking about AI but I think it is sort of the elephant in the room right now and it has been for a couple years but it seems like because I remember in um what 2023 like right aftert came out going on the conference circuit and looking at all the vendors and suddenly well so was

19:35data mesh was the cool thing in 2022 right every vendor had a data mesh story of some sort right data product not viviveran uh [laughter] but They're holding out for 2023. I'm kidding. So, [laughter] but then AI happened and then all of a sudden everybody had a chatbot interface within it seemed like a few weeks. It bolted

19:55onto the product very clumsily, but then it seemed like it was maturing and now I've yet to see it feels like most data vendors that I see in the conference world or conference halls have some sort of a a AI and now an agent story.

20:07Everybody has the agent thing going on and and then I don't know what next year uh brings. But what what I what a litmus test I did at Google next this year was I walked I actually had my stopwatch on my um on my watch surprisingly it does that. Um and I walked and I timed how far I could go without uh hearing the

20:24word agent >> and it was around 5 seconds in the vendor. [laughter] So, and it was just agent like okay >> reset agent reset and and and it was interesting because that wasn't the the word that you heard in 2020 20 uh um four right not as much so so it is uh it's a fascinating time I don't know

20:47but what did you think when you when you brought that question up >> I I have no idea um >> well it happens I think it's I think it's everything is oriented around that.

20:59To your point, this is the stuff that everybody talks about. It's all like tacking AI on or trying to build something natively with it or whatever.

21:06I there is a little bit of a weird thing where and and George, you sort of mentioned this at the beginning, data and AI are very different things. I think pre generative AI there was like data is kind of like machine learning, machine learning is kind of like AI. We all kind of get lumped in the same thing. And now

21:24like generative AI is a very different thing that honestly has nothing to do with data really. Now we can use it like there's a lot of products that will use it but like the things that the modern data stack historically did has very little to do with like AI as it exists now. The closest thing is you can build

21:41pipelines basically that take support tickets and instead of doing a bunch of structured data about support tickets that you then put into a dashboard, you can take unstructured data about support tickets that has the actual text in the sport ticket and feed it into something that can then like understand what it means. But like you could have

21:56generative AI and the modern data stack as being totally separate things. We could continue to exist as the modern data stack without talking about AI at all. And so are we talking about it because it's just a big thing that we can't ignore or is like data and AI actually somehow married?

22:12>> I think the most important intersection is actually in code generation. Um because these AI tools are so good at code generation and they're quite good at writing SQL. If you give them the right, you basically just need to give them a description of every column and every table that you want them to know about that's enough and how to join. You

22:32got to tell them the joints uh and they can write excellent SQL queries. And this is just changing how analytics teams workflows work. um because you basically have an infinite supply of junior analysts uh and that really changes what you can do. So I to me that in terms of things that are concretely like in production and working for

22:54people that are at that intersection that is the biggest thing that I see generally they're not great at like ingesting relational data and doing something with it like SQL works really well for that. um and working with,

23:09you know, the the kinds of pipelines that feed like language model training and stuff like that, it's like a whole different stack. Uh so it's it's just you're not going to put that data in your data warehouse.

23:20>> That's fair, but don't you think that that the I guess the original data stacks, don't you think that that's helpful for scaling Genai? like for example for for powering rag features or any it basically providing the the right context for agents for example that's actually agents aren't useful without the right context and isn't that a data

23:40problem a data retrieval and yeah >> kind of so I think that is a little bit of us chasing the thing because we want to be a part of that story like to what you're saying I very much agree with you George that the question really I guess of this should we talk about AI is should we talk about AI as consumers of

23:56AI sure should we talk AI as though we are producers of it in some way. I think like we act like it because again kind of data, machine learning, AI all feels kind of clustered together, but like that feels a little forced. It feels a little bit like us wanting to do that because that's where all the heat is.

24:13But like I don't actually know if all of the stuff to your point I agree that there's like some useful things that we built and like Fiverr as an example, Fiverr can move data from one place to another. Is that useful for a lot of AI stuff? probably >> but it's not like immediately intuitive that it is

24:28>> rag is rag is real and in prod. So that is the other thing there's in terms of things where the the same data stack that you use to power uh analytics is powering AI or being powered by AI somehow. There's code generation on the sort of ad hoc query side and then there's definitely rag that's that's

24:46real. There's 5 ourselves uh do rag a lot off of our data warehouse for an internal >> so it's a part of an AI system. It's not the model necessarily, but still the AI is is a system and it seems like this piece could be an important part of it.

25:01>> I think from the vendor side though too, if you were to uh say try and get funding for your uh data modern data sec startup, right? At least what I've heard the scuttlebutt is it's going to be a lot harder unless you have an AI story, right? And so there's almost the the pressure there too. I think you would

25:17certainly tell a story of how this is useful for AI in some way. Not like oh my god we use it like people be like of course you do. It's like how do you how do you become how are you selling shuffles? Um and so it's like feels a little bit forced.

25:31>> It does. Yeah. Yeah. Like I said it was it was a I noticed it was a sudden change all of a sudden where the vendors that were not AI all of a sudden were like oh you're an AI company and everyone is an AI company all of a sudden. So it was it definitely felt a bit weird but here we are. Um I I guess

25:46uh and now we're agent companies and whatever. Uh so what's something you wish people would talk more about in either uh data or AI? Maybe starting with Ben.

25:56This is a little bit of the thing I was talking about before. I I think that it's like how much of the problems that we're trying to solve get pro solved in means that aren't data and aren't or like traditional data in the way we think of it and aren't quantitative.

26:09that like there's a lot of things that the d modern data stack whatever that means particularly but a lot of things that existed to solve problems to understand your business answer questions all those sorts of things that I think is like how much of that actually gets solved through some totally different means >> um and I don't think people are quite

26:28appreciating the degree to which that could migrate away >> to like again you just ask questions of your support tickets and it just gives you an answer like that's that's going to be real appealing real quick >> and so I There's a lot of stuff that we did that a whole bunch of tools got built to try to be like how are we gonna

26:45and find these hidden insights in our business that we built a bunch of tech assuming that was there and we've struggled to find it like these dashboards we built all these dashboards nobody looks at them as like the easiest example tracing a bunch of things where there's this assumption that like we as data people can find these magic

27:01insights and it's like we kind of didn't and I think the gravity of where people go to answer questions and get stuff like ask questions and get stuff answered doesn't necessarily look like data stuff for that long.

27:12>> Yeah. >> I don't know. What do you What do you guys think? >> Yeah. I have a couple of thoughts. A question. Do you think that comes down to understanding how what people really want? What like what users really want?

27:23>> Kind of >> like actually understanding instead of I mean what we're seeing a lot in AI is you a lot of innovation, a lot of demos.

27:29These are all very exciting, but at the end of the day, is it something that customers and users really want? I think that's I I mean I think that's something that that I think about as I'm building products or building or deploying things. I wonder if >> I mean my view of it is the thing that they really want. It's not like there's

27:49the whole thing of like what actions do I take. It's not quite actions. It's just like I want to understand what's going on. I want to understand what my customers like. I want to understand what they don't like.

27:56>> These things aren't quantitative by nature. we have sort of made them quantitative but really if you talk to a CEO you ask them like why do you believe the things your customers like what your customers like until almost universally it'll be like I heard this from somebody I heard this story and it stuck with me and like we don't really believe in that

28:16because it's like oh it's anecdotal it doesn't really count >> if you can scale that up if you can answer questions on top of an entire corpus of support tickets that kind of aggregates them for you in the way basically a LLM does that becomes I think far more influential and impactful to somebody being like, I want to

28:30understand my customers than looking at a bunch of charts that suggest trends. >> In part because I think it's like it's just much more evocative and in part because the problem with those stories with like user research is it is anecdotal because you can only do so much with it. But if you can scale it up where like I can actually read a

28:48thousand support tickets all at once and get kind of an approximate aggregation of that, that's not anecdotal anymore.

28:54And so I think like that suddenly becomes a very compelling way to actually understand your business is just use that sort of information and once we make use of it. I also think we start to collect a lot more of it that we've done a huge amount of work to collect quantitative data because we were had things we could do with it. We

29:06could actually put it on dashboards. >> We didn't really bother collecting a ton of anecdotal data because what are you going to do with it? What are you going to do with 10,000 30 secondond interviews with customers? Like >> pay someone a ton of money to read them.

29:17If you have something to do with them suddenly that's what you're chasing and you're just like yeah just tell me what this thing says. There's I've been I've been wanting to make this workflow. I've just never gotten around to it. So, we do engagement surveys like most companies do. And we're big enough that you can't read all the responses. And

29:32one of the things that always drives me crazy is people present to me these summaries of the responses. Oh, here are the themes. There was this and there was that. And I'm like, 90% chance that's That was not the theme. That's just what you were looking for in these.

29:45You can always find someone who's saying everything, right? And I want like a rigorous AI powered summarizer. The problem is you can't fit them all in the context window. The last time I tried to do this, they wouldn't fit all. It's like I want the AI to tell me what the themes are in the employee engagement survey because I don't trust anybody

30:02else. [laughter] I think I think that would be a great use case. That's the thing that I want.

30:08>> Yeah. And I think if you have that suddenly like the dashboards of those things and like oh the Liyker scores of the it's like I don't care about that.

30:14the liyker score was just a function to understand like a little bit but not nearly as much as it was like when we couldn't actually read these things.

30:21>> For sure. I I think the thing I I hope people talk more about too is what I'm seeing the best people getting the most benefit from are non-technical users.

30:30Like my climbing gym is a really good example. Such a tech broth thing to say.

30:33Um but yes, it is. Yeah. Um but I I uh

30:38but there you know the CEO of the company he he pulled me. He's like I'm writing code. I'm like you're doing what? He's like, I write code now. I'm like, no you don't. Chatbt writes code for you. But but it was really cool. So he's tying together a bunch of back office business operations that are like completely tedious. Nobody wants to do

30:52it. Uh made another thing where it's like, you know, classic take data from PDF, put it into spreadsheet or database, right? This is all automated now. Something that he'd have to pay somebody who hates doing that work. And I think that's the underappreciated part of all this is that even that chatbt at a small gym is saving people countless

31:09amounts of time and just like doing just stupid work that nobody wants to do like not you ask the employees like I don't want to do this job right so I think that's that's some of the biggest benefits I've seen is just you know around the world is people are finding ways to save time uh just doing like

31:23with your slides right just mindless uh stuff and so that you could do but uh >> but my slides were amazing >> they were very good-looking slides.

31:35>> You're >> slides aren't mindless, Joe. >> What's that? >> Slides aren't mindless. Just cuz you don't use slides.

31:40>> Well, sorry. I'm very biased here. Um I think Okay, so the audience, do you have questions for us? I know you you're we've had people talking at you the entire day. Um we'll spare you. Uh and then we'll you ask a question, we'll talk at you again. Um, so >> you wrote about the other day about DBT.

31:58If it had been a UI instead of it just being something we interacted with on the terminal, would it be the same product? It's always interesting. And Dr. Hine, you talked about like, hey, we have these smaller models, they can do more because they can be faster. Is there anything about that intentional constraint like how that makes something

32:18better when it comes to like design or why the iron chef these are the ingredients you get what you make from >> I don't think I can repeat your question but you guys can answer it [laughter] >> the question the first part was if dbt had been a UI instead of uh code like how would that have been different?

32:36Well, yeah, I I that was a DBT obviously

32:41was like a very terminal based thing. It was like a command line script. Uh what happens if they just made like a web they had made a draggy droppy thing down the road. What if they launched with the draggy droppy thing and were like everybody can use this to drag droppy their way into data pipelines? Would people have liked it? I don't think so.

32:55But I don't know if that's a constraint thing. I think that was like it was kind of fun when you were somebody who didn't know how to use a terminal and suddenly you did. You're like this is cool. This is like I'm talking to the Matrix. Um, and I had done draggy droppy things before and that's kind of lame. And so

33:09like yeah, there was like a vibe to it to me that that [music] was impactful. I don't know that the constraint was the thing that made that good. It was like it forced you into an environment that you hadn't used before or a lot of people who were like analytics engineers hadn't used before that a lot of them

33:26found it appealing. Like I think a lot of analy data people are kind of engineers at heart that never actually learned how to be engineers. And so DBT gave them this kind of like halfway excuse to be like I'm an engineer and this is fun.

33:36>> The secret of DBT success is that it made analysts feel like software engineers because now they got to type curly braces.

33:43Some have said, [laughter] >> yeah, to answer your question regarding small models, you know, I think that large models, they had to be there first. It was through adding the more parameters, training on more data that we, you know, three years ago when chat was first launched that was the magic, the emergent properties that was that was

34:06the magic that happened when we scaled up. But I think yeah I think as we think of the future on deploying across lots of different devices deploying in lots of different environments imagine regulated industries for example that maybe can't go to cloud that can't go to cloud so we start to question okay we have these amazing LLM capabilities how

34:27much can we capture in a smaller package so now it can be deployed on prim. So I do think um again my my answer here is we had to make models larger to just to to have those breakthroughs and see what's really possible then as we then as it's adopted we now get to question okay in constrained environments what

34:45could we do >> okay there's I have a question about that you all sell to like big enterprises mode sold to people on product hunt um

34:55do you think having the constraint of selling to people who could not use cloud products has made y'all better like it has made stuff more durable in some ways probably but like in that to your question of like you're saying large small models are there is a constraint there I'm curious like does the fact that some people can't use

35:13cloud software make cloud software better in some way

35:21>> that's a good question >> yeah I mean I think yeah I'll add that yeah I think it it's definitely challenged us to to you know

35:34to get models smaller so that you know we could have low latency and lower cost to serve for for different regulated industries and um it's it's it's definitely opened the doors for us to serve a variety of customers for example customers who rely on their phones for example to get their jobs done I think it's yeah I think it's it's it's

35:53actually yeah broadened our customer base in some sense >> if you look at duck DB and tying it back to uh Mother Duck. Uh thanks for the event. Um I was walking around the Hanus Mohisen the other day when we were in Helsinki enjoying the fabulous 40°ree cold weather there. Um but he was walking me through the constraints of

36:14making something like duct DB where they're trying to get as much of the operations at at clock speed of the CPU as possible, right? And I thought that was really interesting constraint environment where it has to download quickly. It has to be ultra fast.

36:27>> Interesting. And it's just a highly optimized piece of software that's not um cloud. I guess it could be cloud, but it could be on your um phone. It, you know, it could be on a satellite in space. But I think that's >> uh a really good example of I think optimizing under severe conditions of of constraint and trying to make a better

36:44database from the smallest um ingredients possible. Yeah. So, >> but >> you're pondering something. Uh but no.

36:53Uh any other questions here? We we can I think we have time for one more. Uh yes, Robert.

36:57>> So I just had a comment about what you said about uh small language models.

37:02>> It's very similar to Clayton Christensen's disruptive innovation, right? >> It it reaches a much larger audience with lesser features, >> which >> but it gets wider acceptance because of it and and puts the the the more technical, more more more capable product out of business.

37:22>> Interesting. Um, well, I think we're at time. So, uh, thank you very much. I know that, um, it sounds like there's a a happy hour or something happening, so I hope you are happy doing that. Um, but thank you very much and shout out to everyone for showing up. Thank you.

37:41[music]

More 2025 Talks

The Great Data Engineering Reset: From Pipelines to Agents and Beyond

The Great Data Engineering Reset: From Pipelines to Agents and Beyond

Joe Reis

The Unbearable Bigness of Small Data

The Unbearable Bigness of Small Data

Jordan Tigani

Projection Pushdown vs Predicate Pushdown: Rethinking Query Efficiency

Projection Pushdown vs Predicate Pushdown: Rethinking Query Efficiency

Adi Polak