Text 26 Feb mongodb aggregation using scala (2/2)

This is a follow up on the 1st part of the post.

In this second part, we will see one more example of the aggregation and some of the design choices.

Let’s assume we have a collection “projectTracker” filled with documents that look like:

[
{
        "userId" : "uid1",
"fname" : "Amine" "lname" : "Ferchichi", "project" : "AGCOD" },
    {
        "fname" : "Peter",
"userId" : "uid3", "lname" : "Coons", "project" : "CM" },
    {
        "fname" : "Amine",
"userId" : "uid1", "lname" : "Ferchichi", "project" : "MeC" },
    {
"userId" : "uid3", "fname" : "Peter", "lname" : "Coons", "project" : "MeC" },
    {
userId" : "uid4", "fname" : "John", "lname" : "Smith", "project" : "MeC"
},
    {
"userId" : "uid2", "fname" : "Fred", "lname" : "Bohan", "project" : "CM" },
    {
"userId" : "uid1", "fname" : "Amine", "lname" : "Ferchichi", "project" : "PF" },
    {
"userId" : "uid3", "fname" : "Peter", "lname" : "Coons", "project" : "PF" },
    {
"userId" : "uid2", "fname" : "Fred", "lname" : "Bohan", "project" : "RecEng" }
]

Now let’s try to get the projects that both users “uid1” and “uid3” worked on. But first, let’s break down the problem:

1) Filter projects to get only the ones that were worked on by “uid1” OR “uid3”.

2) Group projects by userIds, and add them into a temporary array.

3) Filter out the results that do not contain, both, “uid1” and “uid3”

// step 1: 1st Filtering
val firstMatchStatement = MongoDBObject("$match" -> MongoDBObject("userId" -> MongoDBObject("$in" -> MongoDBList("uid1", "uid3"))))
// step 2: Grouping val groupStatement = MongoDBObject("$group" -> MongoDBObject( "_id" -> MongoDBObject("project" -> "$project"),
"tempUsers" -> MongoDBObject("$push" -> MongoDBObject("userId" -> "$userId")) ))
// step 3: 2nd Filtering
val secondMatchStatement = MongoDBObject("$match" -> MongoDBObject("$and" : MongoDBList(MongoDBObject("tempUsers.userId" -> "uid1"), MongoDBObject("tempUsers.userId" -> "uid3")))
val pipeLine = MongoDBList(firstMatchStatement , groupStatement , secondMatchStatement )
val result = db.command(MongoDBObject("aggregate" -> "projectTracker", "pipeline" -> pipeline)).get("result")

Now, you might be wondering why is the pipeline getting a DBList as argument, as opposed to a straight json/DBObject just like the MongoDB doc says. Well, in this case, we have two $match clauses, having a DBObject (basically a key value map) will simply override the 1st match with the 2nd one, hence the usage of a DBList.

Comments
blog comments powered by Disqus

Design crafted by Prashanth Kamalakanthan. Powered by Tumblr.