Building Large Apps With Tiny Databases | Small Data SF 2024

2024Small Data SF

Building Large Apps With Tiny Databases

Modern applications are often huge, complex engineering projects - but they don't have to be. Applications increasingly need to work with data on local devices to support real-time collaboration and offline support. 2025 will be the year of Local-First development, and we'll demonstrate how new ways to deploy infrastructure can help us make this a reality.

Speaker

Søren Bramer Schmidt

Søren Bramer Schmidt

Founder and CEO

Søren Bramer Schmidt is the founder of Prisma and a database nerd from a time when Big Data didn't fit in memory.

0:00[Music]

0:16It's uh it's good to see so many of you. This talk is about building applications with this new architecture we call local first. My name is San Brahmidt. I live in Germany, but uh every few months I fly out here to meet smart people like yourself. So, if you want to to connect to chat more, send me a DM on Twitter

0:33and we can have coffee one of the coming days. The little guy in the corner is my son. Two months ago, he took his first steps and I was so excited. Now, I'm just tired all the time because I'm running after him, trying to make sure he doesn't get into trouble. So, this is a question that I get all the time.

0:49Given the advancements in in AI, these advanced models, LMS, will there still be a need for software developers in the future? And this is a question I care about a lot as somebody running a company that building tools for software developers. Uh so I've been thinking about this and my conclusion is this. I think that demand for software

1:08development far outstrips supply. And if you think about it, many of you are software developers. You know, you have good jobs, paid high salaries. How can that be? That's because the world demands software development. So I think what will happen now that we can get more efficient at software development with the help of AI we'll just start to

1:27eat into this un unfulfilled demand for software development. And when thinking about the future, it's always helpful to to look to the past. And I think a a transition that happened in the past that is very analogous is this transition to digital photo editing. If you go back in time 30, 40 years ago, people were editing photos. They were

1:47just very inefficient at editors because they didn't have modern tools. So what happened is computers, Photoshop came along and we got much better at doing photo editing. Um and as a result um do we have less people doing photo editing? We have dramatically more people doing photo editing. All of us are doing photo editing dayto-day on our

2:09iPhone. Uh so I tried to find some primary data to illustrate this trend. And after wasting half an hour, I did what I often do after wasting half an hour. I go to one of these AI tools and I ask it to do it for me. And this took 20 seconds. And I had a beautiful graph.

2:25And this is this is my point. If you look at the inflection point around year 2000, that's when we the world started to really figure out how to use computers and how to do this Photoshop thing. And from that point, the amount of photo editing happening, the people engaged with photo editing dramatically expanded. That's where we are today with

2:45software development. We at this inflection point where we're just starting to to figure out how can we use these modern tools we have at hand to do software development better and faster.

2:56So I think that's an exciting time for us to be in this industry. Um, let's talk about local first application development. And I think a good way to dive into what is local first is to to look at a few apps that are built this way. Many of you are probably familiar with linear. It's a project management

3:13tool came out seven eight years ago and from the beginning uh the founders build it this way with a sync engine. All the data is local. They want their differentiating factor to to be to deliver a great experience for the end users. Of course, Apple famously cares about privacy. So they build many of their applications this way where all of

3:33your data is local. Then they encrypt it and they do the syncing. Figma is a is an app that is structured around the document model. It's a web app. You go to the website runs in the browser but then you load a document as a progress bar. It's a little slow but as it is when it is loaded all your editing

3:52happens locally. That's very different from more traditional applications. So to understand the local first architecture, let's contrast it to the traditional architecture. Many of you building applications are familiar with this treatier architecture. All the data is stored in one database centrally somewhere probably US East. And all the users are running on their own devices, a computer, a laptop, a phone. And

4:17whenever you want to to load some new data, want to make changes to the data, you have to have that go all the way out to US come all the way back and say, "Yep, the data is persisted.

4:27As software developers, we um we like simple architectures and this is a simple architecture, but we also want to proate provide a great experience for end users. So we do all kinds of complicated work to make this actually work well in practice. We do optimistic UI updates. We try to cache some data very carefully invalidate the cache when

4:47when the right conditions occur. And this turns our our simple application into a uh a complex distributed systems problem which is really annoying. All you're trying to do is just to to display some data to the end user. So the proposition of local first architecture is to get rid of all of that complexity from the old world and

5:08do things differently. We take the database, we put it into the application. We store it on the local device where the user is and then we have a thing sitting on the side that uh we call a sync engine. This sync engine is in charge of making sure all the data you edit locally makes its way up to the

5:26server and back down to you again to other clients that are interested in that same data. So I saying in the local first uh ecosystem is that the cloud should be a way to enhance your application not its absolute foundation. If you have an application that is built in this local first way you can run it without the

5:43cloud. You can run it offline for a while and then sync later. Or even if the cloud goes away, the company goes out of business. As long as you still have your application, your data, you can keep running. When evaluating architectures, there's two different perspectives that are relevant to look at. One is the user. What is the user

5:58experience about? And the other is the developer experience. These apps I showed you before, they all picked this architecture because they wanted to stand out in the market. They want to differentiate themselves. And linear for example, they spend a lot of time building this complex syncing engine because there was nothing they could just take off the shelf and use. They

6:17have to do it themselves. It's a lot of hard work, a lot of investment, but it delivered a better user experience. What I'm really excited about though is the perspective of the developer and this is why I think local first is actually going to to take over the world of app development.

6:34uh it provides a much simpler mental model for developers building these applications. The linear team, they put out some great tech talks talking through all of the stuff they have built. And one of the really interesting things they talk about is how they have a small team of engineers that maintain the syncing system. But then the rest of

6:52the team, the the people implementing features, delivering values for users, they have a much simpler job now because they don't need to deal with with remote requests, with caching, with these distributed systems problems. They have all of the data locally and they can just go and build cool features. So I'm going to talk about Prisma for just a

7:11little bit and then talk about how we are approaching building a general solution for local first applications.

7:17So first a show of hands. How many of you have used Prisma for application development? So that's about a third of you. That's very cool. Most places I go are entirely um focused around application development and and this audience is a is an interesting mix of application developers and then data scientists. On the other hand, Prisma is

7:38an OM for working with data on the back end. And when you're working with data in a in an API or web server, there are three main workflows. One is you need to structure your data. That's what we call data modeling. You need to be able to change that structure. That's what we call migrations. And of course, you need

7:54to be able to interact with that data to query the data, update the data, and delete the data. And over the last eight years, we've managed to become the most popular OM in the JavaScript and TypeScript ecosystem. And very early on, people asked us questions like this. Can we use Prisma in React Native? Can we use Prisma in the browser? I'm building an

8:18app. I want to use Prisma. Um, and for a long time, we were only focused on serverside development. Entire front-end world is a complex world and we left that to other people. Uh, but now earlier this year, we launched support for running Prisma. The Prisma ORM in React Native applications and later we'll bring Prisma with SQLite to the

8:38browser as well. So that brings these easy workflows, data modeling, migrations and querying, but also tight React integration with reactive queries, query caching, and easy global state management. And to understand what that means, we can look at some code. I hope it's big enough that you can read it from from the back. Uh this is a simple

8:58um React component, very familiar to what many of you are probably writing every day. And there are three lines I'd like you to focus on. Line number one uses Prisma to load some data. Um, it queries the transactions table. It lists all of them. It orders by date and then it includes the category that is different table. Just

9:20load that in as a as a as a subobject. But this is not just a normal query. This is a reactive query. So under the hood, this is setting up a react hooks um hook. And that means whenever the underlying data in the database changes, the value of this transactions uh variable also is updated. In line two, we just use this

9:43data in the view. We loop over the transactions. We generate some uh some some components. And in line three, we have this unlong press go and delete the transaction.

9:55So when that happens, the database changes, the value in line one is automatically updated because of the hooks mechanism, the view is updated.

10:03And this works with data changes happening in the same component, but also data changes happening anywhere else in your application. So this is really cool. If you're building your app this way with reactive queries all the way down from the database driving your UI, then there's a lot of state management that you can just get rid of

10:21entirely. Uh earlier this year I went to appjsconf and and talked about this. I did some live coding migrating this demo application from just normal SQLite queries to reactive queries with Prisma and I could cut the size of the application roughly in half. Just getting rid of all of this boiler plate that you don't need anymore. Um so this is all very cool but

10:45it doesn't really give us a local first app. It gives us a a local only app.

10:51That's not very useful. So the other piece of this, the missing piece is the seamless data sync engine. And linear other companies that have built applications this this way, they're really pioneers. They have had to build these things on their own. What is really cool now in the industry is that a bunch of players are starting to

11:11figure out how can we build a general purpose sync engine that works for many kinds of applications. And that's where we're going with Prisma. Um if you look at this local first architecture diagram again there are really two things that a general purpose sync engine needs to support. One is change propagation. So that means when a change happens at one

11:33device it has to propagate up to the sync engine to the backend database and then out to other clients that are also interested in this data. It's a very basic thing you need to support. The other thing you need to support is don't leak data. You can't sync this data to everybody using the application. You need to be very careful to not give it

11:54to people who shouldn't have access. And that actually turns out to be a really complex thing. And we'll talk about this in a little bit. Change propagation itself has been a huge topic in the local first world for the last five plus years and it has been really very academic. Uh some of you may be familiar with CADs, conflict

12:16conflict free replicated data types. The general idea is that if you have your types created in a very special way, then you can encode what should happen when merges happen in the future. You can just have different devices kind of move along. They never know about each other. They never talk to a centralized server. And when they come together,

12:34these data types, they just know how to merge and end up in a in an end state that everybody agrees on.

12:42What we found is that CDTs are really cool. They're great for some use cases.

12:46Rich text editing is a great example of where CDT makes a lot of sense, but they're also kind of difficult to work with. They make it difficult to implement permission logic and things like that. So the other approach is is where we're going with Prisma and that is to have a single authoritative server. Um and it works in these four

13:06simple steps. This is probably what you would come up with if you were designing something like this yourself. a change happens locally on one device then uh you that that same change is is picked up by the sync service performed on the server on the on the centralized database then you have a diff that is distributed to all clients that are

13:25interested in this change and then you have this step four that you have to do you need to rebase if needed and what does that mean rebase is something you do in git but is also something you have to do when you're building a sync system like this and to illustrate the problem you can look at at this event where two

13:41clients are performing a change that introduces a conflict and they're doing it at roughly the same time. They're doing it before any of them can learn about the change from the other one. In this case, it's a user clicking on the like button and they both want to increment the value from 41 to 42. In an in a naive approach, you might end up in

13:59a situation where the first one sets it to 42 and then the other one also sets it to 42 and there's an an increment lost. So the point of the rebase is to make sure that that that um that that client that that came in last uh is kind of rolled back and then it redo red read

14:18red the same update from the correct state. Now this is the thing that the local first community is still trying to figure out. You need to prevent data leak. You can't just sync data to everybody. That would be simple but you just can't build a production system that way.

14:36Uh so to break down this problem, you can we can take a look at notion as an example. Many of you probably use notion, but for those of you who don't, it's a it's a relatively simple uh document editing app. It has documents.

14:49There are links between documents. Documents live in a workspace. A workspace has a bunch of users. Users have access to some documents, but not necessarily all documents. And that's where the problem comes from. Uh now, Prisma is a pretty heavy user of notion.

15:03So somewhere in that database there's a Prisma workspace. Um and Niko my colleague has access to some of those documents. I have access to some of those documents. Neither of us have access to all of the documents.

15:17So how do you decide which documents what data to share? Do you just share all the data with both of us? That doesn't really work. There are those properties uh that a sync engine needs to needs to offer. All of my data should be available to me all the time. That's the point. If I have this property, then

15:34I can just grow data locally and never have to worry about a remote server. But no, no other data should ever be available to me locally because if it is, even if it's kind of filtered out in the app, I could go and look at the database and I could excfiltrate data that I shouldn't be able to. Then the

15:49system should offer a great developer experience, which in practice means it should have a SQL interface. Um, there are two approaches to this and one is what I call query shapes. Uh, I saw an electric SQL

16:04t-shirt earlier today over there. So, if you want to learn more about this approach, go and talk to this guy.

16:10Um, I my feeling is that this is a little complicated. The system is complex both to build uh to operate and to reason about as a user. So, we taking the other approach which is a slightly more um radical approach but a very simple model. I call it partition data on access boundary.

16:31So, let's try to reason about this a little bit. We're going to turn our Prisma workspace into tons of little databases.

16:42Um, so Notion is uh is really just

16:47divided into documents and you either have full access to a document or you don't have access to a document. So if you take all of the documents in our Prisma workspace and turn them into individual SQLite databases, you'll end up with 10,000 databases, something like that. Not a lot, but way more databases than notion is currently managing.

17:05Moving to SQLite, you can handle that. So that's okay. Zooming in on the Prisma workspace, it might look like this. You now have lots of little databases. You have users and for each of these users, it becomes trivial to decide what databases should be synced to them. It is simply the the databases that have access to nothing more, nothing

17:25less. Um, here are a few well-known

17:29applications and thinking through how you could could kind of divide them up into databases. For notion, every document becomes a database. For Slack, every channel becomes a database. For Figma, obviously, every file becomes a database and so on and so forth.

17:45Um, a theme of this conference is that hardware is getting faster. So you no longer need complex clusters and and all this kind of cloud stuff we all did 10 years ago. You can just use individual machines. So that's the way this thing will operate. You have a single machine stores lots and lots of little SQLite databases and then it uses S3 for uh for

18:07durability and long-term storage. And one interesting aspect of this approach to data management is that it fits really well into this new world of uh of AI.

18:19Some of you might be aware, but a few months ago, Slack had this incident where the new AI feature could leak data from private channels that you shouldn't have access to. And how does that happen? Well, it it happens because Slack said all the data in a workspace, we give that to the to the AI and then

18:38we make sure that the AI only talks to you about data that you have access to.

18:41Well, it turns out just like SQL has SQL injection, these new LLMs, they have prompt injection. you can convince the LLM to do stuff that it has been told not to do. So don't do that. Instead, give the LLM only the data that it should have access to. And if you already partition your data this way, becomes relatively simple. Just get all

19:02of the little databases uh from my device that I have access to, give it to the LM, ask the LM to do cool stuff. Um so in conclusion, local first architecture is about moving the database to the client. That alone is huge for uh for the developer experience. Partitioning data on access boundary enables these AI workflows without risking

19:26uh data getting to where it shouldn't be. Um we are trying to build the best

19:34uh DX for for making these local first applications. Uh we have this sync engine that is uh is open source. You can go and play with it. It's not production ready. We'll get there at some point next year. Um, and I'm I'm excited for for where this is heading, where software development is heading.

19:53Um, thank you all. [Music]

More 2024 Talks

Big Is Not A Number: Dispelling The Myths Of Big Data

Big Is Not A Number: Dispelling The Myths Of Big Data

Jordan Tigani

Data Minimalism: Delivering Business Value For The 99%

Data Minimalism: Delivering Business Value For The 99%

Ravit S. Jain, Jake Thomas, Celina Wong, James Winegar, Josh Wills

Squeezing Maximum Roi Out Of Small Data

Squeezing Maximum Roi Out Of Small Data

Lindsay Murphy