Combination field filtering
Complex queries that involve multiple conditions and multiple fields incur
index overhead. You can avoid this overhead if you know you’re working with a
moderately-sized dataset by using the FQL Filter
to compute and filter
the results of an initial, larger dataset.
Although using indexes is recommended as the default query strategy, not all queries benefit from indexes. Fields with low cardinality can impact overall performance. For example, a field that represents a boolean value has low cardinality.
This tutorial shows how to improve the performance of an indexed query on
low cardinality fields by using the Filter
function. It then extends the
example, using the Range
function, to cover the more complex case of a
field that holds a value within a range. Next, a more real-world problem is
solved for by combining index and function solutions.
Conventional approaches using indexes
Using a shopping site as an illustrative example, where the site provides filtering on the Price, Type, Brand, and Customer Rating of product items, shows that a single index that contains the four fields as terms is inefficient.
The default solution is to find where the results intersect by querying each field separately and combining the results using four indexes, one for each field, and different query statements for each pair of fields solved for. As examples, a filter that contains only values for Type and Brand has the following query:
Intersection(
Match(Index("products_by_type"), "some-type"),
Match(Index("products_by_brand"), "some-brand")
)
and a filter that contains values for Type and Customer Rating has the following query:
Intersection(
Match(Index("products_by_type"), "some-type"),
Union(
Map(
[3, 4, 5],
Lambda(
"stars"
Match(Index("products_by_rating"), Var("stars"))
)
)
)
)
Continuing the pattern, the number of possible query statements becomes 24, or 16, even if only four distinct queries, one for each field, are needed. The main issues are:
-
Four indexes are needed. If there are many fields to filter on, your index overhead grows exponentially.
-
Functions such as
Intersection
,Union
,Difference
, andJoin
are computationally intensive. Combining them compounds the cost.
The Filter
function
If the dataset is small to moderate size, you can simplify the query by using
the Filter
function with the comparison operators, as shown in the
following example:
Paginate(
Filter(
Documents(Collection("Products")),
Lambda(
"x",
Let(
{
p: Get(Var("x"))
},
And(
GTE(Select(["data", "price"], Var("p")), 100, null),
LTE(Select(["data", "price"], Var("p")), 200, null),
Equals(Select(["data", "brand"], Var("p")), "some-brand", null),
Equals(Select(["data", "type"], Var("p")), "some-type", null),
GTE(Select(["data", "customerRating"], Var("p")), "3", null)
)
)
)
)
)
The Filter function returns only those elements for which the lambda function
returns true. For example, computing the Customer Rating filter becomes
more efficient because you can use the GTE
comparison function.
Incorporating the Paginate
snippet, you can replace all 16 queries
in the index example with a single, reusable, user-defined function (UDF) that
handles all filter field combinations:
CreateFunction({
name: "my-filter",
body: Query(
Lambda(
"filterParams",
Paginate(
Filter(
Documents(Collection("Products")),
Lambda("x",
Let(
{ p: Get(Var("x")) },
And(
If(
ContainsField("minPrice", Var("filterParams")),
GTE(
Select(["data", "price"], Var("p")),
Select(["minPrice"], Var("filterParams"))
),
true
),
If(
ContainsField("maxPrice", Var("filterParams")),
LTE(
Select(["data", "price"], Var("p")),
Select(["maxPrice"], Var("filterParams"))
),
true
),
If(
ContainsField("brand", Var("filterParams")),
Equals(
Select(["data", "brand"], Var("p")),
Select(["brand"], Var("filterParams"))
),
true
),
If(
ContainsField("type", Var("filterParams")),
Equals(
Select(["data", "type"], Var("p")),
Select(["type"], Var("filterParams"))
),
true
),
If(
ContainsField("customerRating", Var("filterParams")),
GTE(
Select(["data", "customerRating"], Var("p")),
Select(["customerRating"], Var("filterParams"))
),
true
)
)
)
)
)
)
)
)
})
You can call your my-filter
UDF repeatedly with variable filter conditions,
such as by Type and Brand:
Call("my-filter", {
type: "some-type",
brand: "some-brand"
})
Or, query by Type and Customer Rating:
Call("my-filter", {
type: "some-type",
customerRating: 3
})
Combining indexes with a filter
Practically, the optimum solution for a specific query pattern often relies on a combination of approaches. Continuing with the same example but assuming a very large dataset, such as a large product catalog, paginating through the entire collection might be inefficient but you can incorporate indexes to reduce the dataset size.
An effective index is one whose terms have high cardinality. In this example, Brand might be a good index if there are many brands. Low cardinality Type and Customer Rating filter options are poor choices for indexes.
Create the index first, then use it in your new query:
CreateFunction({
name: "my-filter",
body: Query(
Lambda(
"filterParams",
Paginate(
Filter(
If(
ContainsField("brand", Var("filterParams")),
Match(
Index("products_by_brand"),
Select(["brand"], Var("filterParams"))
),
Documents(Collection("Products"))
),
Lambda(
"x",
Let(
{ p: Get(Var("x")) },
And(
If(
ContainsField("minPrice", Var("filterParams")),
GTE(
Select(["data", "price"], Var("p")),
Select(["minPrice"], Var("filterParams"))
),
true
),
If(
ContainsField("maxPrice", Var("filterParams")),
LTE(
Select(["data", "price"], Var("p")),
Select(["maxPrice"], Var("filterParams"))
),
true
),
If(
ContainsField("type", Var("filterParams")),
Equals(
Select(["data", "type"], Var("p")),
Select(["type"], Var("filterParams"))
),
true
),
If(
ContainsField("customerRating", Var("filterParams")),
GTE(
Select(["data", "customerRating"], Var("p")),
Select(["customerRating"], Var("filterParams"))
),
true
)
)
)
)
)
)
)
)
})
When the query runs, the following snippet evaluates the index and matches on
Brand, returning a smaller dataset before calling the Filter
function:
If(
ContainsField("brand", Var("filterParams")),
Match(
Index("products_by_brand"),
Select(["brand"], Var("filterParams"))
),
Documents(Collection("Products"))
),
Possible issues with this solution include:
-
If the search wants all brands, you still need to do a full scan of the
Products
collection. -
A UX change might be needed to prevent runaway queries. In the example, you might need to make the search Brand field mandatory to keep searches performant.
Using the Range
function
Site filter options for querying large datasets commonly include fields that
hold a value with a range. Examples are date fields and minimum and maximum
price fields. The following example adds a Price field to the previous
example and leverages the FQL Range
function in the solution.
To compute the range on price, you need an index that declares price as a value:
CreateIndex({
name: "products_sorted_by_price"
source: Collection("Products"),
values: [
{ field: ["data", "price"] },
{ field: ["ref"] }
]
})
You can change the UDF to include the Range
function as follows:
CreateFunction({
name: "my-filter"
body: Query(
Lambda(
"filterParams",
Paginate(
Filter(
Range(
If(
ContainsField("brand", Var("filterParams")),
Match(
Index("products_by_brand_sorted_by_price"),
Select(["brand"], Var("filterParams"))
),
Match(Index("products_sorted_by_price")),
),
Select(["minPrice"], Var("filterParams")),
Select(["maxPrice"], Var("filterParams"))
),
Lambda(
["price", "x"],
Let(
{ p: Get(Var("x")) },
And(
If(
ContainsField("type", Var("filterParams")),
Equals(
Select(["data", "type"], Var("p")),
Select(["type"], Var("filterParams"))
),
true
),
If(
ContainsField("customerRating", Var("filterParams")),
GTE(
Select(["data", "customerRating"], Var("p")),
Select(["customerRating"], Var("filterParams"))
),
true
)
)
)
)
)
)
)
)
})
In this example, Price is a mandatory filter and you can see that the
Range
function is incorporated in the query whether or not you’re
filtering by Brand. When all Brands are included, a range scan on Price is
applied to the Products
collection, which is more efficient than scanning the
entire collection.
Summary
Consider the effectiveness of the indexes and weigh them against their incurred overhead:
-
Does an index produce low or high cardinality?
-
How many fields are filterable?
-
Is the dataset large or small?
Efficient queries combine:
-
An index on a field with high cardinality.
-
Filtering on other values that have low cardinality.
-
The
Range
function if you know the field holds a range of values on which to force a mandatory filter.
Is this article helpful?
Tell Fauna how the article can be improved:
Visit Fauna's forums
or email docs@fauna.com
Thank you for your feedback!