Graph Modelling for Identity Developers

Author: Alex Babeanu

This is the first post of a series about graph modelling. This first episode goes through the very basics of building data models for graphs and graph databases, with a focus on Identity and Access Management (IAM). 

The following posts will detail how to apply these graph modelling techniques to two IAM pillars: Identity Governance and Administration (IGA) and Authorization (AuthZ). Future topics also include more advanced subject such as model optimization and GraphQL…

Why do it?

“Graphs” and “graph modelling” may seem like esoteric or arcane subjects, practiced by obscure math wizards trapped in data analysis hell, but this is not entirely true anymore. Whereas graphs can be invaluable in analyzing data, a topic we will touch on in later posts, they can also be used very efficiently in real-time processes such as Dynamic Access Control or Identity Administration and Governance (IGA).

The field of Identity and Access management (IAM) has been relying for decades on technology built mainly in the last century, often using hierarchical (i.e., LDAP) stores to model the data it needs for processing. This has been a great limitation in our modern ultra-connected world, leading to all kinds of problems (Role Explosion, and over privileged permissions to only name two). The types of relationships present between Subjects, protected Resources, Organizations, Devices and Systems far exceed the simple parent-child relationship these ancient systems only support. Furthermore, traditional relational stores are limited in the number of relationships that can be traversed at any given time (i.e., the number of joins), rendering them practically unusable in the intricate scenarios we see nowadays throughout the industry. 

In order to model and make sense of today’s data complexity, we need graphs. It is by far the simplest (and sometimes only) way to proceed. Thankfully, what may have been a huge knowledge gap to cross in the past has recently become a simple step, especially with the advent of platforms such as 3Edges…

Graph Modelling is easy

Graph modelling is easy, everybody can do it. You don’t need to be a mathematician, a data scientist or even a developer actually.  In fact, everybody does it all the time, mostly unconsciously. It is how our mind works: it organizes our experiences and memories in the huge graph that constitutes our lives. Our knowledge, our immediate sensations and emotions are all interconnected in a vast web of information that allows us to make predictions, detect similarities and anomalies, group similar “things” together, recognize patterns  and, in general terms, live our lives.

This is quite easy to show. For example, as I drove back home yesterday, I noticed a red car parked right in front of my house, in the spot I would normally use to park. Several things occurred at once without me being really conscious of them: annoyance at having to park further, but also  the colour red. Most cars have very bland colour schemes these days, at least in my neighbourhood, and that red was a stark contrast with the surrounding blacks, whites or greys. The roundness of the car reminded me of tomatoes. What can I say, maybe hungry? Anyway, I love tomatoes, and this conjured up a memory: my father and I sitting near a river back in France, and eating raw tomatoes, ages ago, some time in the last century….

And there you have it: car → red → tomatoes → memory: my father → emotions

Everything in our mind is interconnected in a similar way: things, facts, memories, emotions. And we can conjure-up any of these through the stimulus of our everyday lives, following the relationships forged over the years through our experiences. We don’t think about it, it’s natural: it just happens, it’s what we do. But this graph we’ve built is really a personal thing, everybody has their own, and sometimes we diverge in ideas, experiences or thoughts. Nevertheless all of us traverse our own graphs constantly, use them to perform our own internal analytics to come-up with our decisions and inferences. Now, try to realize this: note how your own thought processes,  your own awareness jumps from one thing to another, even when these things are seemingly unrelated, but in fact are in a way you might not have noticed before.

A Graph is just that, a set of things related to each other. These can be objects, ideas or emotions. In the field of Digital Identity, we’ll just consider things and ideas and leave the emotions for other fields of study; what a relief! 

Graphs are just graphical diagrams made up of two simple visual components: circles and arrows:

By convention, we label circles with nouns and arrows with uppercase verbs (“VERBS”):

We can now read the graph by simply following the arrows as it forms a sentence in plain language. Simple. The circles are called “Nodes” or “Vertices” (these are synonyms), the arrows are called “Relationships” or “Edges”. The Nodes represent things or ideas, while the relationships provide us with the semantics of how these things are related to each other.

Now, looking around you, you can easily start building a mind graph of the things that surround you. In fact, this is exactly what you would do if you were creating a mind map, which is also a Graph.

Everything is a Graph

As we’ve seen, we view the world through the filter of our own mind graph; it should therefore come as no surprise that everything can be modelled as a Graph, any data or fact. This is particularly true of Digital Identity. 

Identities are always related to something: an Identity floating around alone in space would not be a concern, we could safely just ignore it for all our Identity and Access Management (IAM) purposes. On the other hand, we need a way to manage and control the access of these identities to our systems, from the moment that they are registered with those systems until they leave them.  This identity registration is, in effect, the creation of a relationship between that Identity and our systems. And since we are tasked to protect said systems, this relationship matters: we should use it!

Describe what you know

Graph modelling is easy, it starts with drawing what you know. Tim Berners-Lee, the father of the World Wide Web, said it himself in his seminal paper: 

The system we need is like a diagram of circles and arrows, where circles and arrows can stand for anything.” 

Source: https://www.w3.org/History/1989/proposal.html

(I know I use that quote a lot, but what can I say: the World Wide Web was designed as a giant Graph!)

So let’s do just that! 

Let’s say we need to secure the medical records of a clinic. That is the problem we need to solve. The first thing we know is that we have to manage medical records. So let’s draw that…

Good. Now we also know that these records pertain to our patients, we therefore also need patients…

We end-up with two nodes. We then need to express the fact that our patients own their medical records…

Figure 1 - Our first graph model

And we have a graph! It’s simple to read and understand, just proceed from left to right... Note here that we’re so far building a generic model of the things we know. We’re not saying patient Alice owns Medical record 123, not quite yet, we’re still at the design phase: we’re still figuring out what our graph data looks like. Let’s call this our Graph Model.  The very basic graph Model above (Figure 1) says: “Our patients own Medical records”, in generic terms.  A Graph Model shows what the data in the actual data graph looks like. It shows the various types of labels that our data has, and how these data types relate to each other.

Now let’s resume our modelling; what else do we know? 

Well, our doctors need access to those medical records. Let’s add them…

Figure 2: Basic Medical Record model

This will do for now, it’s an accurate depiction of the things we currently care about. It is now up to us to create actual data that looks like this in our Graph Database. For example, our actual data may be this:

Figure 3: Some basic Medical data

Our graph database stores two patients, Alice and Bob, who each own their own medical records. Our doctor Raj can access both records.

Note here that this graph data complies to the model we devised above (Figure 2). In particular:

  • Only patients can OWN medical records.
  • Doctors CAN_ACCESS medical records.
  • Patients can only OWN, they cannot also ACCESS (presumably other records).
  • There are no relationships between Doctors and Patients in this system. This may seem weird or even wrong, but whatever relationship there exists in the real world between patients and doctors doesn’t affect us here: we only care about those medical records. And that is an important point: we don’t need to model everything, only those things we care about to solve the problem at hand.

Notice here also that we’ve added some properties to our nodes. This is how we store data in a Labeled Property Graph (LPG), which is the type of Graph we’ll be using from now on. We’ll see in the second part of this blog series that there are also other types of Graphs. This is not important yet, and in any case the data above is sufficient to get us going now. In a LPG, the Nodes, and the Relationships too, can store any number of properties of any type. Here we’ve just recorded our patients and doctors names, but we could easily have added any number of additional properties such as address, date of birth, etc.

And that’s it. You can now add to your model as you see fit, create new models, relate them to each other, and build a true representation of the things you care about in your organization.

Conclusion

In this first part of our blog series, we’ve explored why understanding and using Graphs is important in our current highly-interconnected world. In particular, we’ve seen that hierarchical data structures can’t really help anymore, nor can legacy “relational” databases that are limited in their ability to traverse longer relationships in data. Graphs provide an elegant way of solving these problems.

Luckily, we’ve seen that graph modelling is easy because it is highly intuitive. Everybody can do it (and does it unconsciously all the time!).

To illustrate this, we’ve explored a simple Patient-Doctor-Medical Record data system expressed as a graph. Along the way, we learned the differences between a Graph Model and Graph Data, as well as what a Labeled Property Graph (LPG) is.

In the next post, we’ll see how to use this simple model for two core IAM tasks: Identity Governance and Administration (IGA), as well as Authorization. Stay tuned for Part 2!

Learn more about 3Edges, a Nulli company, at 3edges.com.