Small data, big features: we are rewriting SQLite! | Small Data SF 2025

2025Small Data SF

Small data, big features: we are rewriting SQLite!

SQLite is the most deployed database in the world, and a crucial player in the small data movement. It powers everything we touch, from small wearables to server-side applications. But as the world changes, is it ready for the challenges that modern infrastructure demands? We believe the answer is no: from its lack of support for concurrent writes, to its inability to work with complex data like vector embeddings, SQLite needs a fundamental overhaul. In this talk we will explore why a complete rewrite is the most practical path forward to bring SQLite into the modern era. We'll dive deep into how Turso, our full rewrite of SQLite in Rust, tackles these challenges head-on—delivering true concurrency, native vector support, and dramatic performance improvements. Expect concrete benchmarks, implementation details, and a clear roadmap for SQLite's future.

Speaker

Glauber Costa

Glauber Costa

Founder & CEO, Turso

Glauber started his career working with the Linux Kernel, where he met Turso Co-founder and CTO Pekka Enberg. Before Turso, Glauber worked as Staff Engineer at Datadog, where he authored the Glommio Rust async executor. In his experience at companies like IBM and Red Hat, he has worked with virtualization technology, storage, and containers. He then spent almost a decade at ScyllaDB, serving as VP of Field Engineering and designing core database features.

0:00His [music]

0:09[music]

0:15[music]

0:20[music] name is a misnomer. So don't you get it twisted. It may be light but lots because it's [music] so freaking fast. Yeah. Thank you.

0:33Just like [music] the library. Fire it up. Yeah. You don't need no stinking serillions.

0:42[music]

0:50[music] >> Woohoo. Thank you. [applause] Thank you. Thank you so much for having me. As you can notice, it's been a while

1:04since I've done this, some 20 years.

1:08Lots of things have changed. I have more stamina back then. I also spent a lot more money in shampoo, so that's good.

1:19And uh and 20 years, 20 years is a long time. 20 years is an important, you know, marker for me because it's about the first time that I've ever used SQLite and I fell in love with it, the first time I've used this. Who here have used SQLite before? And fast forward to now, less stamina, less hair, but we are

1:47rewriting SQLite from scratch in Rust.

1:52And I've been telling people for a while about what we're doing. And usually, and this happened today three times, by the way, I talked to three people. It happened three times. People go from this to this.

2:08Why are you doing this? And more importantly, man, this sounds like a very challenging thing to do.

2:17SQLite is a database that is loved by everybody. It is used by everybody now and in a sense I think it's the OG database that brings this idea of like a a little bit of data at the right place at the right time. Now Jordan may never forgive me for saying this Jordan but I think SQLite is the father duck. Can we

2:41say that? you know and uh it inspired a

2:46generation of databases that try to do the same thing for for their own domains and in many cases very well.

2:56SQLite is the most reliable database in

3:00the solar system. Now people say this it's most reliable database on earth but it runs on the Mars rover. So it is the most reliable database in the solar system. Now beyond our solar system, I won't make any guesses.

3:16Why would you rewrite it? Why would you rewrite it? And the answer to that question is that I think we are at an

3:25inflection point in our industry and you probably heard this today many times by many speakers.

3:34We are in the age of agents and agents want local state. They want things that run locally, that stay local, that behave like a file system that gives you the right data at the right shape, at the right size.

3:52But we don't think the SQLite is fully up to the task and it needs to evolve. We want to be this evolution.

4:02Now I'm not going to talk too much about the amazing features that we add to SQL light or the methodology that we use to test it of deterministic simulation testing. I'm outside after the talk.

4:16We're going to chat about it. But I wanted to focus today on something more tangible and you know just to drive the point home some of the performance improvements that we bring into SQLite because we believe that you know a lot of that will come through on performance and and and getting the database to work well. SQLite is also known to be pretty

4:36fast. So it helps to drive the point home of how much there is to do to improve this system.

4:44We are going to explore today some of the key things that we are doing and I'm going to talk about three things in particular. Everything being async. So truso for example is a fully async uh reimplementation of SQLite. We follow a async architecture and that alone gives us many advantages. So for example we have a native was build. If you go to

5:09shell.tour.te attack. You're going to see a demo that's just a demo of how you can integrate this database right in the browser. We believe that databases for agents are going to have to be everywhere and they're going to have to be in systems in which SQLite sometimes because of its synchronous architecture uh have trouble uh penetrating. Now

5:31in terms asynchronous is nice and you know gives us gives you some performance boost but the most interesting thing about that is that it serves as a foundation for the next thing.

5:45SQLite is a great database for reads. You add data to it, you read it and it's local. No network round trips. It's just there. But writing to SQLite has been traditionally a big challenge. Uh it's something that we we hear by the way we run a poll with people about this and 70% of them their major complaint was

6:07SQLite has a single writer architecture now it works for some cases but if you're trying for example to multiplex agent file system into SQLite it becomes pretty challenging pretty fast and

6:23we wanted to fix that we wanted to fix that to unlock what we believe it's it's going to be needed in the future. So if you look at this graph for example, you have the performance of SQLite for writes for one thread, two threads, and four threads. And what you're going to see is that it kind of doesn't change.

6:44In fact, it it goes down a little bit. And this is to be expected because now you have contention that you didn't have before. But you keep adding threads. And I'm not talking about networking. I'm not talking about multipprocessing. I'm not talking about any of that. I'm just talking about simple threads. The performance is about the same. Now, TUSO

7:01employs MVCC which is a by now not even a new technique anymore uh for concurrent rights. And what we achieve as an as an example is this. If you look at the first bar here for one thread, we

7:17actually shamefully slower. And again, we we'll show the results as they are. There's no need to dock or anything.

7:23thing I mean if I was ashamed of my work I wouldn't have dance on stage and uh this is fixable it's just engineering it's just work I mean we're a relatively new thing and we just here want to show the concept so one thread we're actually for now for now slower than sola we'll talk again last year but when you put

7:43two threads and when you put four threads obviously in this example here there's no contention as any database if you're always contending on the same row that is the dominant thing for performance But here there's no contention. You can see that as you increase the number of threads, performance goes up. So what this allows you to do is have a database that not

8:04only you can read a lot from, but you can write to it. Then we use a synchronous apps sync to be able it's one of the building blocks that we use to achieve this because you now can have all of those threads writing and syncing to the database file in parallel without having to block the whole execution.

8:27And it it really is this. I mean, one of we we embrace things that the SQLite folks have been resisting and embracing. And and um one of them is multi-threading. And if you go to the SQLite FAQ, you're going to see this. I mean, is SQLite thread safe? And threads are evil. Avoid them.

8:51I don't know. I had another presentation where I had a picture of Hitler saying, "No, that's evil." But they asked me not to do it. But uh you you get the mental picture. I mean thread threads are they haven't killed anybody as far as I know.

9:05I don't know. I mean just that maybe they did and I just wasn't aware. But threads are not evil. And we found ino

9:12many opportunities to just embrace multi-threading and extract more performance out of SQLite and again make it make it ready for the demands of the applications to come. Now here's an example. In this example you're going to have what you know It's not hard to imagine. SQLite wants to be like this very simple database. So every time you

9:31open a connection, you have to parse the schema in the SQLite file. So you open a connection is incredibly fast, but you have to go to uh your schema table and parse that table. So you can have prepare statements and you can have all the things that a SQL database is supposed to do. Every connection does that. Every single connection does that.

9:52We have a lot of use cases and this is actually the number may seem a little bit absurd but uh trust me it's not there are many use cases that require this dis this number actually comes from a real use case that a partner of ours had I want to create 10,000 tables in a sec database why would you do that

10:09because you're multi those tables represent something like it's one table per user it's one table per agent session is is whatever like those use cases pop up when you parameterize things now 10 tables If you if you have 10 tables in your database and you open a connection, it takes 140 microsconds. That's pretty awesome. It's actually a lot more than

10:31than one table, but you cannot even see in the graph. It's it's good. But when you have 10,000 tables, it takes a lot more. Takes like 20 something mills.

10:39We're talking like network speeds at this time because I have to parse all of that. And look, you got to parse. You got to parse. Although you could do it lazily, uh, but you know, it's not the end of the world. But the problem is every time you open a connection you pay this price. So what we did with terso is that

10:56okay uh if there is a thread in the same process that is opening a connection to the database the schema is already parsed just pass it along. So again those are all like micro optimizations but it shows the spirit that we're approaching this problem with like embrace the modern world and the results of that. Now honestly I truly don't care

11:20about the number. The number is whatever it happens to be like 40 microsconds which is great but it's constant in the number of tables. So you can have a database with a billion tables. Now if you have a database with a billion tables I think it will break in in other areas but uh opening the connection would take

11:39constant time.

11:43The other thing that we're doing is employing whatever possible incrementalism. So we looked into async. Async is more a fundamental building block. It allows us to do many other things. We look into like embracing modern concepts like multi- threading, MVCC.

12:07But we also believe that now in 2025 we have a lot more data that is incremental and we believe that agents in particular will be dealing with data more and more that is incremental. If you have a streaming system for example like Kafka or any other event systems are generating events one at a time.

12:30This is a very experimental feature by the I think I had a warning about this but we we I forgot to I ended up removing but um what happens is that many databases like Postgres my SQL I'm sure ductt as well have materialized views the thing with material materialized views for those who don't know is essentially if I have a very

12:52expensive query query that may take many seconds perhaps a minute uh and it's a query that's going to happen all the time I want to store the results of that query and I I don't want to do this query all the time. Now you can do the materialized view, but now you have a choice. You have a trade-off

13:12in your hands because if the query takes a couple of minutes to run, you're not going to rerun it once the data the underlying data changes.

13:24So you are usually given the choice between I have something stale or or something fast, right? So just the either I pay the cost every time to have fresh data or I have uh stale data. So that's usually the tradeoff that we we are faced with.

13:42What trusso does is that is based on a p uh so the data is not up to date.

13:48We have a system again it's experimental at this time but like a based on a paper from 2022 if I'm not mistaken called DBSP.

13:57uh DBSP stands for database stream processing and what it allows you to do is recomputee those materialized views live only looking at the delta. So the time that it takes to recmp compute the materialized view is based on the size of the change not the size of the data you can have as much data as you want

14:19you only need to look at that delta of changes. So you have an example like like here you create a table sales and then you insert a couple of things uh into that table you create a materialized view in this case here I'm doing an aggregation I have a maximum minimum uh and I'm I'm doing some group

14:36by and you have here the results what

14:41incremental view maintenance which is the catchall name for those kinds of techniques very recent techniques what that allows us to do is compute again as I add

14:53recomp compute the materializ based on the delta and this is so powerful that it can even happen within a transaction.

15:02So if you're looking here like I begin a transaction I delete a element and then

15:10I select once I select to see the updated price once I row back you don't see the updated price anymore again because those things are updated live so at the end of the day that's the thesis here we have a database that has the shape for what the world the direction that the world is going into the father duck

15:41SQLite the database that inspired it all that started it all but it's a database that's been very slow to evolve and that's fine it's a it's it is the way it is it has performance issues even though I mean it's considered to be a very fast database but what I wanted to show was this. I mean, we can even at that level,

16:04I think we can squeeze a lot more. And lastly, the thing that I really want to drive home is this. Every time we talk about, hey, we're going to rewrite SQLite. And we've been talking about it for a while. We started, if you're familiar with our company, we actually started with a fork of SQLite. We said, we'll fork it and,

16:25you know, we'll try to make those changes in an organic way. That took us up until a point and we at some point we felt that no when agents started to become a thing and those demands started showing up. I mean a lot of those examples that I showed here by the way they're real things that people asked us

16:41for. I'm trying to do this with SQLite. I'm hitting the problem here. I can't write fast enough. I can't you know I I can't open a connection. Uh, so those are all real things that and and like I want to put this right in front of a Kafka queue so I can always have live data that my agent can can work with and

16:56and and it's fast enough. All of that is driven by you know real requirements

17:04but go back to the face that people make. I mean just the reason for that face and this is uh just to end on that note is that look SQLite is considered to be the as we have this light about it the most reliable database in the solar system and people keep asking us and asking us and asking I mean you you got

17:24to understand that the reason we like so much is this is just how reliable that is so to is built with deterministic simulation testing from the ground up from the first day that we started the It's not an external system. It's not something that you plug into it. It's it's something that it is fully deeply integrated into our IO subsystem. Uh we

17:46can fuzz the database. We actually do differential testing against SQLite. We f we generate queries. We query SQLite.

17:52We see what SQLite generates and we then query through see if if it if it generates the same thing. This is in fact what allows us. We started this project. We started taking this seriously around January and by now in

18:08October I think we're implementing around 80% of of SQLite and I think by in in six more months we'll we'll be pretty close. Now if you're familiar with my friend Pareto the 20% sometimes takes a little bit longer than 20%. But but we're pretty close to getting there.

18:24At the same time that we adding CDC, vector search uh and and the things that we discuss here today, TURO is also a open contribution project. So I mean the thing that led us to do this is that Solite if you're not familiar with that does not take contributions from anybody otherwise we would have uh done that

18:43within the project. Uh we have now over 160 contributors. So if you are if you like databases, if you like to participate in that effort, you are very very welcome to do so. Uh again, we have a very thriving community, a very welcoming community and I would love to see uh Marco in particular, but every one of you uh in our community. Thank

19:05you so much. And next year we're going to have fireworks when I come into the stage.

19:10[music]

More 2025 Talks

The Great Data Engineering Reset: From Pipelines to Agents and Beyond

The Great Data Engineering Reset: From Pipelines to Agents and Beyond

Joe Reis

The Unbearable Bigness of Small Data

The Unbearable Bigness of Small Data

Jordan Tigani

Projection Pushdown vs Predicate Pushdown: Rethinking Query Efficiency

Projection Pushdown vs Predicate Pushdown: Rethinking Query Efficiency

Adi Polak