I want to own the database that my apps use

I kind of threw together this post to get an idea out of my head. I don't think I have the connections or wherewithal to drive real world adoption of this idea, but I think that in the hands of someone more capable, it could lead to something pretty cool. So here it is.

The Problem

I have almost no control over the data that apps store for me. I'm talking about the data that apps explicitly advertise themselves for: like the workout data I record in a fitness app, or the todo items I put in a todo app, or the tweets I put on twitter.

The only way I can access this data is via the specific app that stores it, and I can only do things that the product designers had the forethought to include. Generally speaking, I can't do anything with my data that the app doesn't explicitly allow, and apps usually don't expicitly allow much.

Why don't apps allow much? Because every feature requires a developer to implement it, and developer time is limited and expensive. Implementing some random visualization that I want just isn't economical if only I (or some small number of people) want it.

I want to be able to query my data however I want, and I want to correlate data across apps. For instance, I want to be able to answer "How do my workout and eating habits (each tracked in their own apps) affect my sleep (tracked in a 3rd separate app)?" I couldn't answer that question quantitatively today, even if I had detailed data on all of those things, because the data would be split between different apps that don't talk to each other.

Not only would I like to be able to correlate data between apps, but I'd like some other app to do it for me. Why can't I give a graphing website access to my fitness data, eating data, and sleep data? Because the apps that hold that data don't expose APIs. They don't have the time and money to spend on that. In other words, I want to be able to sign up for apps that combine and extend the functionality of apps I already use, to a much greater extent than I can today.

The Existing Solutions (and why they aren't good enough)

Manual Export

Export is a feature that app developers have to spend time on, and most won't.

But let's say that every app in the world had an export function. Then I'd be able to download all of my data, but it'd likely all be in different formats. Doing anything productive would require some data wrangling. And if I want the latest updates to all my data, I have to go export everything again and data wrangle again.

What if I want to share my exported data with another app? I guess I'd have to export it, and then upload it to the third-party app, but that's a step that doesn't have to exist. And again, if I want the third-party to have the latest data, I need to export and upload regularly.

APIs

One notch better than data exporting is a proper API. APIs are great. They allow programmatic access to an app's data, and that sounds like exactly what I want right? Well, mostly, yes, but not entirely.

In most cases, APIs are just a layer to access the data inside of apps: a layer that takes time and money to build, and doesn't add value over direct access to the data. Because of this, APIs don't exist for everything that I might want. Furthermore, APIs have the problem I mentioned before, which is that the app developers are gatekeepers of what you can do: If I want to analyze the duration of all my workouts, but the API doesn't return that piece of the data, then I'm SOL.

Solid

Tim Berners Lee's Solid attempts to solve the problem I'm trying to describe in a very similar way to what I'm about to propose. Solid offers a home for users' data and acts as an authentication method as well. Soild pods can be hosted by different providers, or self hosted.

Where Solid falls short is its lack of pragmatism. Its technology is too different from what developers already know, and so will have a problem in gaining developer adoption, which is a prerequisite for gaining traction among anyone else. The fact that developers need to "get familiar with Linked Data vocabularies" before they can even create an app means that Solid is too expensive to adopt for 99% of profit-driven apps. Why learn a new method of storing data, when you already know how to use postgres?

So while I am in alignment with the goals and intent of the Solid project, I personally think we need a solution that "just works" for all the developers and companies out there who don't have time to learn a whole new technology.

Others

The author of https://beepb00p.xyz has talked about the problem I'm describing at length and has even made an interesting tool to help address it. The tool makes programming with your app data much easier, and I think its a great pragmatic step in the right direction, but it still relies on the existence of APIs and export functionality of apps (to my knowledge).

I think Urbit tries to address this problem too, but I'm not too familiar with the project. From what I can tell though, they seem to want to reinvent every single wheel they come across as they approach the problem. In other words, if Solid doesn't seem quite pragmatic enough, Urbit doesn't seem to even want to be pragmatic.

This isn't a new problem, so there are probably more projects out there that try to address this in some way that I'm not aware of.

Okay, so what's my suggestion?

How can we make data ownership and sharing between apps the default? In a way that is easy for developers to adopt and for users to understand?

We use existing technologies: OAuth and regular old databases.

I think the best way to describe this idea is through a couple of user scenarios. There is still plenty of detail to be worked out, but hopefully this will convey the essence of the thing.

I'll refer to a service I'm going to call "MyData". You can think of it as a combo authentication provider and "database provider". By "database provider", I mean that it will encapsulate any of a number of popular database technologies. It could provide postgres, mysql, mongo, cassandra, etc. Other apps will store user data in these databases.

And while I will refer to "MyData" as a service, really there could be many providers of this service.

I also wanted to name my fictional user, so I duckduckgoed "fake name" and picked the first one that came up.

First Scenario

  1. Ronald has a MyData account
  2. Ronald wants to sign up for CoolFitnessTracking.com
  3. CoolFitnessTracking presents a "Sign in with MyData" button
  4. Ronald clicks the button and is shown an OAuth screen from their MyData account. The requested permissions will include something along the lines of:
    • Create postgres database called "CoolFitnessTracking"
    • Read and write data to "CoolFitnessTracking" postgres database
  5. Ronald clicks "Authorize", and is logged in to CoolFitnessTracking
  6. CoolFitnessTracking backend code reaches out to MyData to create a postgres database
  7. MyData spins up a postgres instance and returns the url for CoolFitnessTracking to connect
  8. CoolFitnessTracking uses the provided postgres instance for all of Ronalds's data
  9. MyData also provides access to the database directly to Ronald
  10. Ronald can run queries against the CoolFitnessTracking postgres database

Now, Ronald has a new postgres database hosted by MyData that he has complete access to. Ronald can write a SQL query against it, or export it, or authorize another app to access it. The second scenario describes this.

Second Scenario

  1. Ronald wants to graph his fitness data but doesn't want to do it himself
  2. Ronald discovers SweetFitnessGraphing.online which says they graph CoolFitnessTracking data automatically!
  3. SweetFitnessGraphing presents a "Sign in with MyData" button
  4. Ronald clicks the button and is shown an OAuth screen from their MyData account. The requested permissions will include something along the lines of:
    • Read data from "CoolFitnessTracking" postgres database
  5. Ronald clicks "Authorize", and is logged in to SweetFitnessGraphing
  6. SweetFitnessGraphing backend code reaches out to MyData
  7. MyData provides SweetFitnessGraphing with a postgres user and the url so that it can read the CoolFitnessTracking data
  8. SweetFitnessGraphing reads directly from the CoolFitnessTracking database and automatically displays visualizations to Ronald
  9. Ronald is happy

The result that the scenarios are supposed to portray

In the above scenarios, MyData encapsulates all of Ronald's databases and authorizes access to them. This effectively removes the need for APIs, because users and apps will get direct access to other apps' databases. The result of this is that the database schema is the "API". This does put responsibility on developers of apps to ensure they keep up with schemas of other apps that their code accesses.

Now, using a different database connection per user is definitely not a trivial suggestion. Not only that, but apps that require storing data that the user shouldn't access will also still need a centralized database for that data. I still think this is more pragmatic than something like Solid, because it's "just a software engineering problem" using technologies everyone knows, as opposed to a "learning, lack of community, and young technology problem".

How could we make this a reality?

Unfortunately, this isn't a purely technical problem, so we need to think about a lot more than just the technology itself. Here are a smattering of things to consider:

It needs to be an open standard

Solid is doing this totally right. MyData needs to be implementing a standard and have competitors that also implement the standard. Otherwise, if there was only one provider, we'd have just consolidated power over all our data with one entity.

This will also give users choice, and prevent this from turning into another walled garden. The best way to enable that is to make the protocol by which apps interact with MyData an open standard. Then create an open source reference implementation that is easy to get running.

It needs a killer app combo

This is pretty straightforward. To bootstrap the ecosystem of "databases as APIs", we need a reason for people to create accounts and start using the technology. This needs to happen by providing value that can only be provided via this system, which is why it can't just be a singular killer app, but a killer combination of apps. The killer combination of apps would interoperate via the MyData standard, and together provide value in a way that is hard to match via a traditional software architecture.

It needs to be monetizable

Not sure what the best path is here. Here are a couple options on money flow:

Apps pay MyData

Its both amazing and flabbergasting that on the internet, we don't expect users to pay (directly) for many services. We need to keep this in mind when designing the monetization scheme around "MyData". We live in a capitalist world, so the hosting providers of these databases (the "MyDatas") will need to make money somehow.

But if we can't charge the user, who do we charge? The apps that connect to the MyData service will have to pay a usage based amount of money, based on how much data they're reading/writing from it. We're all used to this kind of pricing model from AWS and other cloud services.

Users and apps pay MyData

Perhaps a MyData provider could directly charge users a small fee and then get away with charging the apps less. Apps could then pass savings on to the user. This might even work out cheaper for a user if they use a lot of paid services via MyData.

User pays for everything via MyData

In this scenario, the user would pay MyData a usage based subscription fee, which would increase for each app that they connect. MyData then forwards some of the money to each app that the user connects.

For example, MyData might have a base rate of $5 per month. Then, signing up for CoolFitnessTracking would increase the MyData fee to $10 a month, but about 5 of those dollars would go to CoolFitnessTracking.

It needs to be better than the status quo

Apps need to be incentivized to use MyData over just keeping a centralized database. In other words, adopting this technology needs to make financial sense for for-profit companies.

We might be able to arrange it so that MyData providers take care of most security and GDPR type compliance, which small apps might appreciate.

More broadly, if we can get to a critical mass of an app ecosystem built around shared databases, the benefits of interoperability between apps will hopefully be seen. For example, if others are able to build apps that add functionality to CoolFitnessTracking service, it makes CoolFitnessTracking more valuable to the customer. We can market this as an "automatic plugin/add-on ecosystem" for apps - no extra work required.

Conclusion

This topic seems to be popular as of late. This post was my attempt at presenting a way to address the problem without inventing anything too radically new. However, the largest problems in this space seem to be in driving business adoption, not in inventing technology solutions.