Indexes
Welcome back, fellow space developer! In part 1 of this five-part tutorial, we got our first look at FQL and some fundamental Fauna concepts.
In this second part of the tutorial, we’re going to take a more in-depth look into Fauna indexes.
Recap
We briefly introduced indexes in the previous part of the tutorial, but here’s a recap of the FQL commands we learned.
First, we created a simple index to be able to retrieve all our pilots from the Pilots collection:
CreateIndex({
name: "all_Pilots",
source: Collection("Pilots")
})
Then, we retrieved a list of references:
Paginate(
Match(
Index("all_Pilots")
)
)
Map(
Paginate(Match(Index("all_Pilots"))),
Lambda('pilotRef', Get(Var('pilotRef')))
)
What can indexes do?
So far, we’ve seen that indexes allow you to retrieve all of the documents in a collection, but indexes are much more powerful than that.
With indexes you can:
-
enforce unique constraints,
-
sort and filter results,
-
create computed values from document data.
Indexes vs SQL views
If you’re coming from the relational world, it can make sense to think about indexes similar to views on a relational database. Views are stored queries that can retrieve data from multiple tables, calculate computed data, join tables to create virtual entities, filter, etc. In a way, Fauna indexes perform similar functions, as we explore in this part of the tutorial.
Indexing across multiple collections
Until now, our indexes have been created on documents from a single collection, but you can configure an index to include documents from multiple collections.
There are many reasons why you might want to do that. Maybe, when designing your database, you’d like to group some collections under a single virtual collection, so to speak. In the relational world, combining database entities under a single entity is known as polymorphism.
To test this, let’s create a new collection to store our land vehicles:
CreateCollection({name: "Speeders"})
Now, with this index, you can retrieve all of the vehicles in the database:
CreateIndex({
name: "all_Vehicles",
source: [
Collection("Spaceships"),
Collection("Speeders")
]
})
When indexing multiple collections, keep in mind that the indexed fields need to be of the same type (string, number, etc) across collections. In the rest of the examples, we’ll use indexes with a single collection for simplicity’s sake.
Sorting results
Indexes also allow us to sort results. Let’s create a new index to get all our pilots sorted by their name:
CreateIndex({
name: "all_Pilots_sorted_by_name",
source: Collection("Pilots"),
values: [
{ field: ["data", "name"] },
{ field: ["ref"] }
]
})
Here, we’re using a values object which defines the output values for the index.
In this case, we are defining two output values:
-
["data", "name"]
a path referring to the name property of the document. -
["ref"]
another path which returns a reference to the matched document. In a moment, we’ll see why we need this.
When using a values object, Fauna always sorts the results in ascending order by default:
Paginate(Match(Index("all_Pilots_sorted_by_name")))
{
"data": [
[
"Buck Rogers",
Ref(Collection("Pilots"), "266359371696439826")
],
[
"Flash Gordon",
Ref(Collection("Pilots"), "266350546751848978")
],
[
"Jean-Luc Picard",
Ref(Collection("Pilots"), "266359447111074322")
],
// etc...
]
}
As you can see, there are two values per result as defined in
the values
object of the index, and these results are now ordered by
those values.
Reverse order
If we want to get the pilots sorted by their name in descending order, we need a new index with the reverse setting:
CreateIndex({
name: "all_Pilots_sorted_by_name_desc",
source: Collection("Pilots"),
values: [
{ field: ["data", "name"], reverse: true},
{ field: ["ref"] }
]
})
Getting documents from sorting results
You can add as many output values as needed without any performance penalty, but we might need to get a document from these types of results:
["Buck Rogers", Ref(Collection("Pilots"), "266359371696439826")]
So how do we actually get documents?
One option would be using the
Select
function, like we learned
in the previous part of the
tutorial:
Map(
Paginate(Match(Index("all_Pilots_sorted_by_name"))),
Lambda("pilotResult", Get(Select([1], Var("pilotResult"))))
)
Since Fauna uses zero-based arrays, the trick here is selecting the
reference in the second item with [1]
, then using the
Get
function to return a document.
Another option would be to simply configure our Lambda to expect an array with two values:
Map(
Paginate(Match(Index("all_Pilots_sorted_by_name"))),
Lambda(["name", "pilotRef"], Get(Var("pilotRef")))
)
In both cases, the result is the same:
{
"data": [
{
"ref": Ref(Collection("Pilots"), "266359371696439826"),
"ts": 1590278941740000,
"data": {
"name": "Buck Rogers"
}
},
{
"ref": Ref(Collection("Pilots"), "266350546751848978"),
"ts": 1590270525630000,
"data": {
"name": "Flash Gordon"
}
},
{
"ref": Ref(Collection("Pilots"), "266359447111074322"),
"ts": 1590279013675000,
"data": {
"name": "Jean-Luc Picard"
}
}
// etc...
]
}
Filtering results
Another useful feature of indexes is being able to search and filter results.
To test this, let’s create a Planets collection:
CreateCollection({name: "Planets"})
Then, create some planets with three different types: TERRESTRIAL
,
GAS
, and ICE
:
Create(Collection("Planets"),
{
data: {
name: "Mercury",
type: "TERRESTRIAL"
}
}
)
Create(Collection("Planets"),
{
data: {
name: "Saturn",
type: "GAS"
}
}
)
// etc..
Finally, let’s create an index to filter our planets by type:
CreateIndex({
name: "all_Planets_by_type",
source: Collection("Planets"),
terms: [
{ field: ["data", "type"]}
]
})
As we saw earlier, the terms
object is used as the search input for
the index, whereas the values
object defines which data the index
returns. With this index, the values
object is not defined, so the
index only returns the indexed document’s reference, by default.
In this case, we’re telling Fauna that the search term uses a field
of the document found at the path ["data", "type"]
.
We can now query our index by passing a parameter to Match:
Map(
Paginate(Match(Index("all_Planets_by_type"), "GAS")),
Lambda("planetRef", Get(Var("planetRef")))
)
{
"data": [
{
"ref": Ref(Collection("Planets"), "267081152090604051"),
"ts": 1590967285200000,
"data": {
"name": "Jupiter",
"type": "GAS"
}
},
{
"ref": Ref(Collection("Planets"), "267081181884842515"),
"ts": 1590967313610000,
"data": {
"name": "Saturn",
"type": "GAS"
}
}
]
}
Filtering on an array value
If we want to match an item inside an array, instead of filtering on a single string, we need to pass the term required to search inside the array.
To test this, let’s add some colors to our ships:
Update(
Ref(Collection("Spaceships"), "266356873589948946"),
// NOTE: be sure to use the document ID of one of your spaceships here
{
data: {
colors: ["RED","YELLOW"]
}
}
)
// etc...
If we now want to filter ships based on a single color, we could create
the following index, which uses the colors
array as a filtering term:
CreateIndex({
name: "all_Spaceships_by_color",
source: Collection("Spaceships"),
terms: [
{ field: ["data","colors"]}
]
})
And then query it:
Map(
Paginate(Match(Index("all_Spaceships_by_color"), "WHITE")),
Lambda("shipRef", Let({
shipDoc: Get(Var("shipRef"))
},{
name: Select(["data","name"], Var("shipDoc")),
colors: Select(["data","colors"], Var("shipDoc"))
}))
)
{
data: [
{
name: "Explorer IV",
colors: ["BLUE", "WHITE", "RED"]
},
{
name: "Navigator",
colors: ["WHITE", "GREY"]
},
{
name: "Le Super Spaceship",
colors: ["PINK", "MAGENTA", "WHITE"]
}
]
}
Fauna is smart enough to understand that if the field used in the terms object is an array, then it should search for an item inside that array instead of an exact match on the full array.
Sorting and filtering at the same time
You can certainly do both at the same time by combining terms
and
values
in the same index:
CreateIndex({
name: "all_Planets_by_type_sorted_by_name",
source: Collection("Planets"),
terms: [
{ field: ["data", "type"]}
],
values: [
{ field: ["data", "name"]},
{ field: ["ref"] }
]
})
And then:
Map(
Paginate(
Match(Index("all_Planets_by_type_sorted_by_name"), "TERRESTRIAL")
),
Lambda("planetResult", Get(Select([1], Var("planetResult"))))
)
{
"data": [
{
"ref": Ref(Collection("Planets"), "267081091831038483"),
"ts": 1590967227710000,
"data": {
"name": "Earth",
"type": "TERRESTRIAL"
}
},
{
"ref": Ref(Collection("Planets"), "267081096484618771"),
"ts": 1590967232165000,
"data": {
"name": "Mars",
"type": "TERRESTRIAL"
}
},
// etc ...
]
}
Enforcing unique values
Another important function of indexes, besides retrieving documents, is enforcing a unique constraint on the documents that can be created.
For example, to add a unique code to our spaceships:
CreateIndex({
name: "all_Spaceships_by_code",
source: Collection("Spaceships"),
terms: [
{field: ["data", "code"]}
],
unique: true
})
This index accomplishes two purposes:
-
We’re configuring it to accept a filtering term with the
terms
object. -
We’re ensuring the defined terms are unique across the documents matched by this index by using
unique: true
.
We’re using a single term here for simplicity’s sake. Uniqueness is
based on the combination of the terms
and values
fields in an index,
so you could create a unique constraint over multiple terms
and/or
values
, much like you’d do in SQL by creating constraints over
multiple columns.
Let’s test this by creating a new spaceship:
Create(
Collection("Spaceships"),
{
data: {
name: "Rocinante",
code: "ROCINANTE"
}
}
)
{
"ref": Ref(Collection("Spaceships"), "267072793181422099"),
"ts": 1590959313500000,
"data": {
"name": "Rocinante",
"code": "ROCINANTE"
}
}
So far so good. Let’s try to create another one with the same code:
Create(
Collection("Spaceships"),
{
data: {
name: "Rocinante 2",
code: "ROCINANTE"
}
}
)
error: instance not unique
document is not unique.
position: ["create"]
As expected, the response is an error since there is already a ship with
the ROCINANTE
code.
When using unique constraints, we know in advance that an index can only
return a single document. So, instead of using the
|
Combining multiple indexes
FQL has a number of functions that allow you to combine results from indexes and other sources in different ways:
-
Union
combines the results from all indexes. -
Intersection
returns the matching results from each index and discards the rest. -
Difference
returns the results that are unique in the first index and discards the rest.
To be able to test these, let’s add some colors to our planets (please excuse any scientific inaccuracies).
// Earth
Update(Ref(Collection("Planets"), "267081091831038483"),
// NOTE: be sure to use your planet's document ID here
{data: {color: "BLUE"}}
)
// Etc...
Let’s also create a new index:
CreateIndex({
name: "all_Planets_by_color",
source: Collection("Planets"),
terms: [
{ field: ["data", "color"]}
]
})
OR filtering with Union
The Union
function combines all of
the results from each index. We’re just using two indexes here, but you
could use any number of indexes.
"Hey Fauna, get me the planets that are of type GAS or are YELLOW":
Map(
Paginate(
Union(
Match(Index("all_Planets_by_type"), "GAS"),
Match(Index("all_Planets_by_color"), "YELLOW")
)
),
Lambda("planetRef", Get(Var("planetRef")))
)
{
"data": [
{
"ref": Ref(Collection("Planets"), "267081152090604051"),
"ts": 1590977605890000,
"data": {
"name": "Jupiter",
"type": "GAS",
"color": "BROWN"
}
},
{
"ref": Ref(Collection("Planets"), "267081181884842515"),
"ts": 1590977684790000,
"data": {
"name": "Saturn",
"type": "GAS",
"color": "YELLOW"
}
}
]
}
As you can see, Union
skips duplicates, since Saturn is a gas
giant and appears in the results of both indexes.
AND filtering with Intersection
The Intersection
function returns only the results that are the
same in all indexes. Again, you could use any number of indexes.
"Hey Fauna, get me the planets that are of type TERRESTRIAL and are BLUE":
Map(
Paginate(
Intersection(
Match(Index("all_Planets_by_type"), "TERRESTRIAL"),
Match(Index("all_Planets_by_color"), "BLUE")
)
),
Lambda("planetRef", Get(Var("planetRef")))
)
{
"data": [
{
"ref": Ref(Collection("Planets"), "267081091831038483"),
"ts": 1590977345595000,
"data": {
"name": "Earth",
"type": "TERRESTRIAL",
"color": "BLUE"
}
}
]
}
NOT filtering with Difference
The Difference
function
compares the first index that you provide with the rest of the indexes,
and returns the results that exist only in the first index.
"Hey Fauna, get me the planets that are TERRESTRIAL but are not BLUE nor RED":
Map(
Paginate(
Difference(
Match(Index("all_Planets_by_type"), "TERRESTRIAL"),
Match(Index("all_Planets_by_color"), "BLUE"),
Match(Index("all_Planets_by_color"), "RED")
)
),
Lambda("planetRef", Get(Var("planetRef")))
)
{
"data": [
{
"ref": Ref(Collection("Planets"), "267081079730471443"),
"ts": 1590977548370000,
"data": {
"name": "Mercury",
"type": "TERRESTRIAL",
"color": "GREY"
}
},
{
"ref": Ref(Collection("Planets"), "267081085891904019"),
"ts": 1590977561660000,
"data": {
"name": "Venus",
"type": "TERRESTRIAL",
"color": "GREY"
}
}
]
}
Index bindings
With index bindings, it’s possible to create pre-computed values based on some document data, using pretty much any FQL expression.
These values are calculated beforehand, which makes retrieving these values super efficient as the operation consumes little CPU. The down side is that these computed values consume storage space. Before deciding to use a binding at scale, you should consider whether the performance boost is worth the storage cost for your use case.
Let’s see a couple examples on how to use index bindings.
Remember our spaceship, from the previous part of this tutorial?
{
"name": "Voyager",
"pilot": Ref(Collection("Pilots"), "266350546751848978"),
"type": "Rocket",
"fuelType": "Plasma",
"actualFuelTons": 7,
"maxFuelTons": 10,
"maxCargoTons": 25,
"maxPassengers": 5,
"maxRangeLightyears": 10,
"position": {
"x": 2234,
"y": 3453,
"z": 9805
},
"code": "VOYAGER"
}
So here comes our boss, the fleet admiral. He has 100 ships in the dock that need refueling and wants to know which ships could be filled faster so that he can fill them first and empty the dock as fast as possible.
Easy, right? To do that, we’d only need to sort our ships by
pendingFuelTons
.
But pendingFuelTons
is not in the ship documents! We’re doomed!
Don’t panic my friend, we have the perfect tool to solve this problem.
Index bindings allow you to create computed values dynamically based on
the data of the document. In this case, we could just calculate the
value pendingFuelTons
by subtracting actualFuelTons
from
maxFuelTons
.
So let’s create our index:
CreateIndex({
name: "all_Spaceships_by_pendingFuelTons",
source: {
collection: Collection("Spaceships"),
fields: {
pendingFuelTons: Query(
Lambda("shipDoc",
Subtract(
Select(["data","maxFuelTons"], Var("shipDoc")),
Select(["data","actualFuelTons"], Var("shipDoc"))
)
)
)
}
},
values: [
{ binding: "pendingFuelTons"},
{ field: ["data", "name"]}
]
})
The only new FQL function we’re using here is
Subtract
, which simply subtracts
the second number from the first.
So let’s query our new index:
Paginate(Match(Index("all_Spaceships_by_pendingFuelTons")))
{
"data": [
[
3,
"Explorer IV"
],
[
3,
"Voyager"
],
[
10,
"Navigator"
],
[
18,
"Destroyer"
]
// etc...
]
}
As you can see, the results are sorted first by the new computed value
pendingFuelTons
and then by the ship name.
Cool!
Filtering by the first letter
Let’s create another example. What if we wanted to get all of the planets
that started with the letter M
? Our planet documents do not have a
firstLetter
property, but we can solve this with bindings too.
We can create a new index with a binding for the first letter of the
name, and add a terms
object to be able to filter the documents by
firstLetter
:
CreateIndex({
name: "all_Planets_by_firstLetter",
source: {
collection: Collection("Planets"),
fields: {
firstLetter: Query(
Lambda("planetDoc",
SubString(Select(["data", "name"], Var("planetDoc")), 0, 1)
)
)
}
},
terms: [
{ binding: "firstLetter"}
]
})
As you can see in the terms
object, the value that we want to use for
filtering is an index binding instead of a document field.
Great, so let’s query the index as usual and pass the letter M
:
Map(
Paginate(Match(Index("all_Planets_by_firstLetter"), "M")),
Lambda("planetDoc", Get(Var("planetDoc")))
)
{
"data": [
{
"ref": Ref(Collection("Planets"), "267081079730471443"),
"ts": 1590977548370000,
"data": {
"name": "Mercury",
"type": "TERRESTRIAL",
"color": "GREY"
}
},
{
"ref": Ref(Collection("Planets"), "267081096484618771"),
"ts": 1590977464930000,
"data": {
"name": "Mars",
"type": "TERRESTRIAL",
"color": "RED"
}
}
]
}
Easy, right?
These bindings are very powerful. We can access all the FQL commands available to produce computed values.
Filtering by any letter
As a final example, let’s see how we could check if an array produced by a binding includes a search term.
The NGram function is currently undocumented. We hope to
officially support it in a future release. You can
check
more details here.
|
CreateIndex({
name: "filter_Spaceships_by_letter",
source: {
collection: Collection("Spaceships"),
fields: {
nameLetters: Query(
Lambda("shipDoc",
NGram(Select(["data","name"], Var("shipDoc")),1,1)
)
)
}
},
terms: [
{ binding: "nameLetters"}
]
})
And query it:
Map(
Paginate(Match(Index("filter_Spaceships_by_letter"), "V")),
Lambda("shipRef", Let({
shipDoc: Get(Var("shipRef"))
},{
name: Select(["data","name"], Var("shipDoc"))
}))
)
{
data: [
{
name: "Voyager"
},
{
name: "Explorer IV"
}
]
}
This works because the NGram
function produces an array of letters
which can be queried by the index.
NGram("FaunaDB",1,1)
["F", "a", "u", "n", "a", "D", "B"]
Or:
NGram("FaunaDB",2,3)
["Fa", "Fau", "au", "aun", "un", "una", "na", "naD", "aD", "aDB", "DB"]
You can create all sorts of binding values. For example, you could
extract the day of the week from a timestamp using the
DayOfWeek
function to get all
of the events that happened on a Friday.
Binding and unique constraints
If you’re wondering, yes, you can use unique constraints over bindings too.
Imagine we wanted to have key cards with ids for accessing our ships. We know that pilots have a history of forgetting their key card ids, so these ids should be memorable and obvious. What if we create them based on the ships' names? And, since key cards would only be available for a single ship, these ids should be unique.
CreateIndex({
name: "all_Keycards",
source: {
collection: Collection("Spaceships"),
fields: {
keyCardId: Query(
Lambda("shipDoc",
UpperCase(
ReplaceStr(Select(["data", "name"], Var("shipDoc")), " ", "_")
)
)
)
}
},
values: [
{ binding: "keyCardId"}
],
unique: true
})
If we query this index, the results should make sense:
Paginate(Match(Index("all_Keycards")))
{
data: [
"DESTROYER",
"EXPLORER_IV",
"LE_SUPER_SPACESHIP",
"NAVIGATOR",
"ROCINANTE",
"VOYAGER"
]
}
If we now try to create a new ship by using a name we’ve already used,
we get an error. The all_Keycards
index prevents two key cards from
having the same keyCardId
, even if we have no unique constraints on
the names of the spaceships themselves:
Create(
Collection("Spaceships"),
{
data: {
name: "Le Super Spaceship"
}
}
)
error: instance not unique
document is not unique.
position: ["create"]
Conclusion
So that’s it for today. Hopefully you learned something valuable!
In part 3 of the tutorial, we continue our space adventure by learning how to model data.
Is this article helpful?
Tell Fauna how the article can be improved:
Visit Fauna's forums
or email docs@fauna.com
Thank you for your feedback!