Ana Sayfa / MongoDB / MongoDB Aggregation

MongoDB Aggregation

Parameter Details

pipeline array(A sequence of data aggregation operations or stages)
options document(optional, available only if pipeline present as an array)

Aggregations operations process data records and return computed results. Aggregation operations group valuesfrom multiple documents together, and can perform a variety of operations on the grouped data to return a single result. MongoDB provides three ways to perform aggregation: the aggregation pipeline, the map-reduce function, and single purpose aggregation methods.

Count

How do you get the number of Debit and Credit transactions? One way to do it is by using count() function as below.

> db.transactions.count({cr_dr : "D"});

or

> db.transactions.find({cr_dr : "D"}).length();

But what if you do not know the possible values of cr_dr upfront. Here Aggregation framework comes to play. See the below Aggregate query.

> db.transactions.aggregate(
 [
 {
 $group : {
 _id : '$cr_dr', // group by type of transaction
 // Add 1 for each document to the count for this type of transaction
 count : {$sum : 1}
 }
 }
 ]
 );

And the result is

{
 "_id" : "C",
 "count" : 3
}
{
 "_id" : "D",
 "count" : 5
}

Sum

How to get the summation of amount? See the below aggregate query.

> db.transactions.aggregate(
[
 {
 $group : {
 _id : '$cr_dr',
 count : {$sum : 1}, //counts the number
 totalAmount : {$sum : '$amount'} //sums the amount
 }
 }
 ]
);

And the result is

{
 "_id" : "C",
 "count" : 3.0,
 "totalAmount" : 120.0
}
{
 "_id" : "D",
 "count" : 5.0,
 "totalAmount" : 410.0
}

Another version that sums amount and fee.

> db.transactions.aggregate(
 [
 {
 $group : {
 _id : '$cr_dr',
 count : {$sum : 1},
 totalAmount : {$sum : { $sum : ['$amount', '$fee']}}
 }
 }
 ]
 );

And the result is

{
 "_id" : "C",
 "count" : 3.0,
 "totalAmount" : 128.0
} 
{
 "_id" : "D",
 "count" : 5.0,
 "totalAmount" : 422.0
}

Average

How to get the average amount of debit and credit transactions?

> db.transactions.aggregate(
 [
 {
 $group : {
_id : '$cr_dr', // group by type of transaction (debit or credit)
 count : {$sum : 1}, // number of transaction for each type
 totalAmount : {$sum : { $sum : ['$amount', '$fee']}}, // sum
 averageAmount : {$avg : { $sum : ['$amount', '$fee']}} // average
 }
 }
 ]
)

The result is

{
 "_id" : "C", // Amounts for credit transactions
 "count" : 3.0,
 "totalAmount" : 128.0,
 "averageAmount" : 40.0
}
{
 "_id" : "D", // Amounts for debit transactions
 "count" : 5.0,
 "totalAmount" : 422.0,
 "averageAmount" : 82.0

Operations with arrays

When you want to work with the data entries in arrays you first need to unwind the array. The unwind operation creates a document for each entry in the array. When you have lot’s of documents with large arrays you will see an explosion in number of documents.

{ "_id" : 1, "item" : "myItem1", sizes: [ "S", "M", "L"] }
{ "_id" : 2, "item" : "myItem2", sizes: [ "XS", "M", "XL"] }
 
db.inventory.aggregate( [ { $unwind : "$sizes" }] )

An important notice is that when a document doesn’t contain the array it will be lost. From mongo 3.2 and up there are is an unwind option “preserveNullAndEmptyArrays” added. This option makes sure the document is preserved when the array is missing.

{ "_id" : 1, "item" : "myItem1", sizes: [ "S", "M", "L"] }
{ "_id" : 2, "item" : "myItem2", sizes: [ "XS", "M", "XL"] }
{ "_id" : 3, "item" : "myItem3" }

db.inventory.aggregate( [ { $unwind : { path: "$sizes", includeArrayIndex: "arrayIndex" } }] )

Aggregate query examples useful for work and learning

Aggregation is used to perform complex data search operations in the mongo query which can’t be done in normal “find” query.

Create some dummy data:

db.employees.insert({"name":"Adma","dept":"Admin","languages":["german","french","english","hindi"]
,"age":30, "totalExp":10});
db.employees.insert({"name":"Anna","dept":"Admin","languages":["english","hindi"],"age":35,
"totalExp":11});
db.employees.insert({"name":"Bob","dept":"Facilities","languages":["english","hindi"],"age":36,
"totalExp":14});
db.employees.insert({"name":"Cathy","dept":"Facilities","languages":["hindi"],"age":31,
"totalExp":4});
db.employees.insert({"name":"Mike","dept":"HR","languages":["english", "hindi",
"spanish"],"age":26, "totalExp":3});
db.employees.insert({"name":"Jenny","dept":"HR","languages":["english", "hindi",
"spanish"],"age":25, "totalExp":3});

Examples by topic:

1.Match: Used to match documents (like SQL where clause)

db.employees.aggregate([{$match:{dept:"Admin"}}]);
Output:
{ "_id" : ObjectId("54982fac2e9b4b54ec384a0d"), "name" : "Adma", "dept" : "Admin", "languages" : [
"german", "french", "english", "hindi" ], "age" : 30, "totalExp" : 10 }
{ "_id" : ObjectId("54982fc92e9b4b54ec384a0e"), "name" : "Anna", "dept" : "Admin", "languages" : [
"english", "hindi" ], "age" : 35, "totalExp" : 11 }

2.Project: Used to populate specific field’s value(s)

Project: Used to populate specific field’s value(s)

db.employees.aggregate([{$match:{dept:"Admin"}}, {$project:{"name":1, "dept":1}}]);
Output:
{ "_id" : ObjectId("54982fac2e9b4b54ec384a0d"), "name" : "Adma", "dept" : "Admin" }
{ "_id" : ObjectId("54982fc92e9b4b54ec384a0e"), "name" : "Anna", "dept" : "Admin" }

db.employees.aggregate({$project: {'_id':0, 'name': 1}})
Output:
{ "name" : "Adma" }
{ "name" : "Anna" }
{ "name" : "Bob" }
{ "name" : "Cathy" }
{ "name" : "Mike" }
{ "name" : "Jenny" }

3.Group: $group is used to group documents by specific field, here documents are grouped by “dept” field’s value. Another useful feature is that you can group by null, it means all documents will be aggregated into one.

db.employees.aggregate([{$group:{"_id":"$dept"}}]); 
 
{ "_id" : "HR" } 
 
{ "_id" : "Facilities" } 
 
{ "_id" : "Admin" }
db.employees.aggregate([{$group:{"_id":null, "totalAge":{$sum:"$age"}}}]);
Output:
{ "_id" : null, "noOfEmployee" : 183 }

4.Sum: $sum is used to count or sum the values inside a group.

db.employees.aggregate([{$group:{"_id":"$dept", "noOfDept":{$sum:1}}}]);
Output:
{ "_id" : "HR", "noOfDept" : 2 }
{ "_id" : "Facilities", "noOfDept" : 2 }
{ "_id" : "Admin", "noOfDept" : 2 }

5.Average: Calculates average of specific field’s value per group.

db.employees.aggregate([{$group:{"_id":"$dept", "noOfEmployee":{$sum:1},
"avgExp":{$avg:"$totalExp"}}}]);
Output:
{ "_id" : "HR", "noOfEmployee" : 2, "totalExp" : 3 }
{ "_id" : "Facilities", "noOfEmployee" : 2, "totalExp" : 9 }
{ "_id" : "Admin", "noOfEmployee" : 2, "totalExp" : 10.5 }

6.Minimum: Finds minimum value of a field in each group.

db.employees.aggregate([{$group:{"_id":"$dept", "noOfEmployee":{$sum:1},
"minExp":{$min:"$totalExp"}}}]);
Output:
{ "_id" : "HR", "noOfEmployee" : 2, "totalExp" : 3 }
{ "_id" : "Facilities", "noOfEmployee" : 2, "totalExp" : 4 }
{ "_id" : "Admin", "noOfEmployee" : 2, "totalExp" : 10 }

7.Maximum: Finds maximum value of a field in each group.

db.employees.aggregate([{$group:{"_id":"$dept", "noOfEmployee":{$sum:1},
"maxExp":{$max:"$totalExp"}}}]);
Output:
{ "_id" : "HR", "noOfEmployee" : 2, "totalExp" : 3 }
{ "_id" : "Facilities", "noOfEmployee" : 2, "totalExp" : 14 }
{ "_id" : "Admin", "noOfEmployee" : 2, "totalExp" : 11 }

8.Getting specific field’s value from first and last document of each group: Works well when doucument result is sorted.

db.employees.aggregate([{$group:{"_id":"$age", "lasts":{$last:"$name"},
"firsts":{$first:"$name"}}}]);
Output:
{ "_id" : 25, "lasts" : "Jenny", "firsts" : "Jenny" }
{ "_id" : 26, "lasts" : "Mike", "firsts" : "Mike" }
{ "_id" : 35, "lasts" : "Cathy", "firsts" : "Anna" }
{ "_id" : 30, "lasts" : "Adma", "firsts" : "Adma" }

9.Minumum with maximum:

db.employees.aggregate([{$group:{"_id":"$dept", "noOfEmployee":{$sum:1},
"maxExp":{$max:"$totalExp"}, "minExp":{$min: "$totalExp"}}}]);
Output:
{ "_id" : "HR", "noOfEmployee" : 2, "maxExp" : 3, "minExp" : 3 }
{ "_id" : "Facilities", "noOfEmployee" : 2, "maxExp" : 14, "minExp" : 4 }
{ "_id" : "Admin", "noOfEmployee" : 2, "maxExp" : 11, "minExp" : 10 }

10.Push and addToSet: Push adds a field’s value form each document in group to an array used to project data in array format, addToSet is simlar to push but it omits duplicate values.

db.employees.aggregate([{$group:{"_id":"dept", "arrPush":{$push:"$age"}, "arrSet":
{$addToSet:"$age"}}}]);
Output:
{ "_id" : "dept", "arrPush" : [ 30, 35, 35, 35, 26, 25 ], "arrSet" : [ 25, 26, 35, 30 ] }

11.Unwind: Used to create multiple in-memory documents for each value in the specified array type field, then we can do further aggregation based on those values.

db.employees.aggregate([{$match:{"name":"Adma"}}, {$unwind:"$languages"}]);
Output:
{ "_id" : ObjectId("54982fac2e9b4b54ec384a0d"), "name" : "Adma", "dept" : "HR", "languages" :
"german", "age" : 30, "totalExp" : 10 }
{ "_id" : ObjectId("54982fac2e9b4b54ec384a0d"), "name" : "Adma", "dept" : "HR", "languages" :
"french", "age" : 30, "totalExp" : 10 }
{ "_id" : ObjectId("54982fac2e9b4b54ec384a0d"), "name" : "Adma", "dept" : "HR", "languages" :
"english", "age" : 30, "totalExp" : 10 }
{ "_id" : ObjectId("54982fac2e9b4b54ec384a0d"), "name" : "Adma", "dept" : "HR", "languages" :
"hindi", "age" : 30, "totalExp" : 10 }
12. Sorting:
db.employees.aggregate([{$match:{dept:"Admin"}}, {$project:{"name":1, "dept":1}}, {$sort: {name:
1}}]);
Output:
{ "_id" : ObjectId("57ff3e553dedf0228d4862ac"), "name" : "Adma", "dept" : "Admin" }
{ "_id" : ObjectId("57ff3e5e3dedf0228d4862ad"), "name" : "Anna", "dept" : "Admin" }

db.employees.aggregate([{$match:{dept:"Admin"}}, {$project:{"name":1, "dept":1}}, {$sort: {name:
-1}}]);
Output:
{ "_id" : ObjectId("57ff3e5e3dedf0228d4862ad"), "name" : "Anna", "dept" : "Admin" }
{ "_id" : ObjectId("57ff3e553dedf0228d4862ac"), "name" : "Adma", "dept" : "Admin" }

13. Skip:

db.employees.aggregate([{$match:{dept:"Admin"}}, {$project:{"name":1, "dept":1}}, {$sort: {name:
-1}}, {$skip:1}]);
Output:
{ "_id" : ObjectId("57ff3e553dedf0228d4862ac"), "name" : "Adma", "dept" : "Admin" }

14. Limit

db.employees.aggregate([{$match:{dept:"Admin"}}, {$project:{"name":1, "dept":1}}, {$sort: {name:
-1}}, {$limit:1}]); 
Output: 
 
{ "_id" : ObjectId("57ff3e5e3dedf0228d4862ad"), "name" : "Anna", "dept" : "Admin" } 

15.Comparison operator in projection

db.employees.aggregate([{$match:{dept:"Admin"}}, {$project:{"name":1, "dept":1, age: {$gt: ["$age",
30]}}}]);
Output:
{ "_id" : ObjectId("57ff3e553dedf0228d4862ac"), "name" : "Adma", "dept" : "Admin", "age" : false }
{ "_id" : ObjectId("57ff3e5e3dedf0228d4862ad"), "name" : "Anna", "dept" : "Admin", "age" : true }

16. Comparison operator in match:

db.employees.aggregate([{$match:{dept:"Admin", age: {$gt:30}}}, {$project:{"name":1, "dept":1}}]);
Output: 
{ "_id" : ObjectId("57ff3e5e3dedf0228d4862ad"), "name" : "Anna", "dept" : "Admin" } 

List of comparison operators: $cmp, $eq, $gt, $gte, $lt, $lte, and $ne

17.Boolean aggregation opertor in projection:

db.employees.aggregate([{$match:{dept:"Admin"}}, {$project:{"name":1, "dept":1, age: { $and: [ {
$gt: [ "$age", 30 ] }, { $lt: [ "$age", 36 ] } ] }}}]); 
 
Output:
{ "_id" : ObjectId("57ff3e553dedf0228d4862ac"), "name" : "Adma", "dept" : "Admin", "age" : false }
 
{ "_id" : ObjectId("57ff3e5e3dedf0228d4862ad"), "name" : "Anna", "dept" : "Admin", "age" : true }

18.Boolean aggregation opertor in match:

db.employees.aggregate([{$match:{dept:"Admin", $and: [{age: { $gt: 30 }}, {age: {$lt: 36 }} ] }},
{$project:{"name":1, "dept":1, age: { $and: [ { $gt: [ "$age", 30 ] }, { $lt: [ "$age", 36 ] } ]
}}}]); 
Output:
{ "_id" : ObjectId("57ff3e5e3dedf0228d4862ad"), "name" : "Anna", "dept" : "Admin", "age" : true } 

List of boolean aggregation opertors: $and, $or, and $not.

Match

How to write a query to get all departments where average age of employees making less than or $70000 is greather than or equal to 35?

In order to that we need to write a query to match employees that have a salary that is less than or equal to $70000. Then add the aggregate stage to group the employees by the department. Then add an accumulator with a field named e.g. average_age to find the average age per department using the $avg accumulator and below the existing $match and $group aggregates add another $match aggregate so that we’re only retrieving results with an average_age that is greather than or equal to 35

db.employees.aggregate([
 {"$match": {"salary": {"$lte": 70000}}},
 {"$group": {"_id": "$dept",
 "average_age": {"$avg": "$age"}
 }
 },
 {"$match": {"average_age": {"$gte": 35}}}
])

The result is:

{
 "_id": "IT",
 "average_age": 31
}
{
 "_id": "Customer Service",
 "average_age": 34.5
}
{
"_id": "Finance",
 "average_age": 32.5
}

Get sample data

To get random data from certain collection refer to $sample aggregation.

db.emplyees.aggregate({ $sample: { size:1 } })

where size stands for number of items to select.

Remove docs that have a duplicate field in a collection (dedupe)

Note that the allowDiskUse: true option is optional but will help mitigate out of memory issues as this aggregation can be a memory intensive operation if your collection size is large – so i recommend to always use it.

var duplicates = [];
db.transactions.aggregate([
{ $group: {
 _id: { cr_dr: "$cr_dr"},
 dups: { "$addToSet": "$_id" },
 count: { "$sum": 1 }
}
},
{ $match: {
 count: { "$gt": 1 } 
}}
],allowDiskUse: true}
) 
.result 
.forEach(function(doc) {
 doc.dups.shift(); 
 doc.dups.forEach( function(dupId){
 duplicates.push(dupId);
 }
) 
})
// printjson(duplicates); 
// Remove all duplicates in one go 
db.transactions.remove({_id:{$in:duplicates}})

Left Outer Join with aggregation ( $Lookup)

let col_1 = db.collection('col_1');
let col_2 = db.collection('col_2'); 
col_1 .aggregate([
 { $match: { "_id": 1 } },
 {
 $lookup: {
 from: "col_2",
 localField: "id",
 foreignField: "id",
 as: "new_document"
 }
}
],function (err, result){
 res.send(result);
});

This feature was newly released in the mongodb version 3.2 , which gives the user a stage to join one collection with the matching attributes from another collection

Server Aggregation

Meteor.publish("someAggregation", function (args) {
 var sub = this;
 // This works for Meteor 0.6.5
 var db = MongoInternals.defaultRemoteCollectionDriver().mongo.db;
 // Your arguments to Mongo's aggregation. Make these however you want.
 var pipeline = [
 { $match: doSomethingWith(args) },
 { $group: {
 _id: whatWeAreGroupingWith(args),
 count: { $sum: 1 }
 }}
 ];
 db.collection("server_collection_name").aggregate( 
 pipeline,
 // Need to wrap the callback so it gets called in a Fiber.
 Meteor.bindEnvironment(
 function(err, result) {
 // Add each of the results to the subscription.
 _.each(result, function(e) {
 // Generate a random disposable id for aggregated documents
 sub.added("client_collection_name", Random.id(), {
 key: e._id.somethingOfInterest, 
 count: e.count
 });
 });
 sub.ready();
 },
 function(error) {
 Meteor._debug( "Error doing aggregation: " + error);
 }
 )
 );
});

Aggregation in a Server Method

Another way of doing aggregations is by using the Mongo.Collection#rawCollection()
This can only be run on the Server.
Here is an example you can use in Meteor 1.3 and higher:

Meteor.methods({
'aggregateUsers'(someId) {
 const collection = MyCollection.rawCollection()
 const aggregate = Meteor.wrapAsync(collection.aggregate, collection)
 const match = { age: { $gte: 25 } }
 const group = { _id:'$age', totalUsers: { $sum: 1 } }
 const results = aggregate([
 { $match: match },
 { $group: group }
 ])
 return results
 }
})

Java and Spring example

This is an example code to create and execute the aggregate query in MongoDB using Spring Data.

try {
 MongoClient mongo = new MongoClient();
 DB db = mongo.getDB("so");
 DBCollection coll = db.getCollection("employees");
 //Equivalent to $match
 DBObject matchFields = new BasicDBObject();
 matchFields.put("dept", "Admin");
 DBObject match = new BasicDBObject("$match", matchFields);
 //Equivalent to $project
 DBObject projectFields = new BasicDBObject();
 projectFields.put("_id", 1);
 projectFields.put("name", 1);
 projectFields.put("dept", 1);
 projectFields.put("totalExp", 1);
 projectFields.put("age", 1);
 projectFields.put("languages", 1);
 DBObject project = new BasicDBObject("$project", projectFields);
 //Equivalent to $group
 DBObject groupFields = new BasicDBObject("_id", "$dept");
 groupFields.put("ageSet", new BasicDBObject("$addToSet", "$age"));
 DBObject employeeDocProjection = new BasicDBObject("$addToSet", new
BasicDBObject("totalExp", "$totalExp").append("age", "$age").append("languages",
"$languages").append("dept", "$dept").append("name", "$name"));
 groupFields.put("docs", employeeDocProjection);
 DBObject group = new BasicDBObject("$group", groupFields);
 //Sort results by age
 DBObject sort = new BasicDBObject("$sort", new BasicDBObject("age", 1));
 List<DBObject> aggregationList = new ArrayList<>();
 aggregationList.add(match);
 aggregationList.add(project);
 aggregationList.add(group);
 aggregationList.add(sort);
 AggregationOutput output = coll.aggregate(aggregationList);
 for (DBObject result : output.results()) {
 BasicDBList employeeList = (BasicDBList) result.get("docs");
BasicDBObject employeeDoc = (BasicDBObject) employeeList.get(0);
 String name = employeeDoc.get("name").toString();
 System.out.println(name);
 }
 }catch (Exception ex){
 ex.printStackTrace();
 }

See the “resultSet” value in JSON format to understand the output format:

[{
 "_id": "Admin",
 "ageSet": [35.0, 30.0],
 "docs": [{
 "totalExp": 11.0,
 "age": 35.0,
 "languages": ["english", "hindi"],
 "dept": "Admin",
 "name": "Anna"
 }, {
 "totalExp": 10.0,
 "age": 30.0,
 "languages": ["german", "french", "english", "hindi"],
 "dept": "Admin",
 "name": "Adma"
 }]
}]

The “resultSet” contains one entry for each group, “ageSet” contains the list of age of each employee of that group, “_id” contains the value of the field that is being used for grouping and “docs” contains data of each employee of that group that can be used in our own code and UI

Bunada Göz Atın

Python Bitwise Operators

Bitwise operations alter binary strings at the bit level. These operations are incredibly basic and …

Bir cevap yazın

E-posta hesabınız yayımlanmayacak. Gerekli alanlar * ile işaretlenmişlerdir