Duplicate values
Problem
Your documents may have duplicate values and you want to find out which values have been duplicated.
Solution
Create an index where the values
definition includes the field that
may contain duplicate values. In this case, we want to find out where
the elements
field in a "spells" document has been duplicated:
Then we can run a query that produces an object keyed on the element
names whose values are Arrays of the associated document
References:
ObjectV(air: Arr(RefV(id = "181388642046968320", collection = RefV(id = "spells", collection = RefV(id = "collections"))), RefV(id = "181388642581742080", collection = RefV(id = "spells", collection = RefV(id = "collections")))),earth: Arr(RefV(id = "181388642088911360", collection = RefV(id = "spells", collection = RefV(id = "collections")))),fire: Arr(RefV(id = "181388642046968320", collection = RefV(id = "spells", collection = RefV(id = "collections"))), RefV(id = "181388642071085568", collection = RefV(id = "spells", collection = RefV(id = "collections")))),water: Arr(RefV(id = "181388642071085568", collection = RefV(id = "spells", collection = RefV(id = "collections"))), RefV(id = "181388642088911360", collection = RefV(id = "spells", collection = RefV(id = "collections")))))
map[air:[{181388642046968320 0xc000180570 0xc000180570 <nil>} {181388642581742080 0xc000180720 0xc000180720 <nil>}] earth:[{181388642088911360 0xc0001808d0 0xc0001808d0 <nil>}] fire:[{181388642046968320 0xc000180a80 0xc000180a80 <nil>} {181388642071085568 0xc000180c30 0xc000180c30 <nil>}] water:[{181388642071085568 0xc000180de0 0xc000180de0 <nil>} {181388642088911360 0xc000180f90 0xc000180f90 <nil>}]]
{air: [ref(id = "181388642046968320", collection = ref(id = "spells", collection = ref(id = "collections"))), ref(id = "181388642581742080", collection = ref(id = "spells", collection = ref(id = "collections")))], earth: [ref(id = "181388642088911360", collection = ref(id = "spells", collection = ref(id = "collections")))], fire: [ref(id = "181388642046968320", collection = ref(id = "spells", collection = ref(id = "collections"))), ref(id = "181388642071085568", collection = ref(id = "spells", collection = ref(id = "collections")))], water: [ref(id = "181388642071085568", collection = ref(id = "spells", collection = ref(id = "collections"))), ref(id = "181388642088911360", collection = ref(id = "spells", collection = ref(id = "collections")))]}
{
air: [
Ref(Collection("spells"), "181388642046968320"),
Ref(Collection("spells"), "181388642581742080")
],
earth: [ Ref(Collection("spells"), "181388642088911360") ],
fire: [
Ref(Collection("spells"), "181388642046968320"),
Ref(Collection("spells"), "181388642071085568")
],
water: [
Ref(Collection("spells"), "181388642071085568"),
Ref(Collection("spells"), "181388642088911360")
]
}
{'air': [Ref(id=181388642046968320, collection=Ref(id=spells, collection=Ref(id=collections))), Ref(id=181388642581742080, collection=Ref(id=spells, collection=Ref(id=collections)))], 'earth': [Ref(id=181388642088911360, collection=Ref(id=spells, collection=Ref(id=collections)))], 'fire': [Ref(id=181388642046968320, collection=Ref(id=spells, collection=Ref(id=collections))), Ref(id=181388642071085568, collection=Ref(id=spells, collection=Ref(id=collections)))], 'water': [Ref(id=181388642071085568, collection=Ref(id=spells, collection=Ref(id=collections))), Ref(id=181388642088911360, collection=Ref(id=spells, collection=Ref(id=collections)))]}
{
air: [
Ref(Collection("spells"), "181388642046968320"),
Ref(Collection("spells"), "181388642581742080")
],
earth: [ Ref(Collection("spells"), "181388642088911360") ],
fire: [
Ref(Collection("spells"), "181388642046968320"),
Ref(Collection("spells"), "181388642071085568")
],
water: [
Ref(Collection("spells"), "181388642071085568"),
Ref(Collection("spells"), "181388642088911360")
]
}
Discussion
For the "spells" documents, the element
field is expressed as an
Array. When an array is indexed, an index entry for each item in
the array is created. That makes it easy to discover every element
value: we can just use Match(Index("spell_elements"))
and every
element name (and its associated Reference) is returned.
The index we created does not specify any terms
so that we can find
all values that may have been duplicated. You can define terms
if you
need to find duplicate values in a subset of your documents. If you do
define terms
, be sure to update the Match
expression to specify
the term(s) you want to search for.
The query may look complex, but the Reduce
function does all the
hard work. It iterates over all of the items in the set returned by the
Match
expression. At the beginning, the accumulator is an empty
object. As Reduce
works its way through the set, the element
values are treated as field names in the accumulator, and the value is
an array of document Reference for the associated documents.
In the Lambda
function, we use Let
to locate the
element
value and the document Reference. If the element
already
exists in the accumulator (acc
), we Append
the Reference to
the already-existing list. Otherwise, we create a new array item that
can be merged into the accumulator.
The mergeit
array is structured so that we can use ToObject
to
create an Object that has a computed field name and an array
value.
The Merge
expression takes advantage of the fact that when the
second object contains a field that exists in the first object, the
second object’s field value is used. This means that the accumulator
evolves as expected during processing.
Be aware that with many documents and/or with high cardinality fields, the query could surpass a limit that causes the query to fail.
Is this article helpful?
Tell Fauna how the article can be improved:
Visit Fauna's forums
or email docs@fauna.com
Thank you for your feedback!