Aggregation framework mainly used for ETL majorly aggregation pipeline. Aggregations operations process data records and return computed results. There are 3 types for doing aggregation in mongoDB.
Single Purpose Aggregation
Aggregation Pipeline
Map Reduce Functions
Single Purpose Aggregation-
There is very simple function like getting no of records from collection or getting distinct records by removing duplicates.
db.mycollection.count()
db.mycollection.distinct()
Aggregation Pipeline
I aggregation pipelining there will be phases. Series of actions will be takes place on input and output will be produced. It is very similar to Map- Reduce concept in Hadoop. There will be different ETL phases as below –
MATCH
GROUP
UNWIND
PROJECT
SKIP
LIMIT
Most of the time when we have to fetch the data we compare syntax with the SQL.
So, similar commands are available in MongoDB.
WHERE - $match
GROUP BY- $group
HAVING - $match
SELECT - $project
ORDER BY - $sort
LIMIT - $limit
Map Reduce Functions
MongoDB also provides map-reduce operations to perform aggregation. Sometime Finalize function also used in map –reduce. I am not going into details of this type as we can do all the things by aggregation pipeline which we can achieve with these functions.
Let us take few examples to get into little deeper to understand these concepts.
Get the project total by year
db.project.aggregate([{$group:{_id:"$year","total by year":{$sum:1}}}])
- aggregate by year
2. db.project.aggregate([{$group:{_id:"$company","total by company":{$sum:1}}}])
- aggregate by year
3. db.project.aggregate([{$group:{"_id": {"firm": "$company","year":"$year"}, "total by company":{$sum:1}}}])
- aggregate by year/company
4. db.project.aggregate([
{ "$match": {"year": 2010, "company": "ABC"} },
{ "$unwind": "$Project" },
{ "$match": { "$or": [{"Project.Domain": /Telecom/},{"Project.Domain": /Retail/}] } },
{ "$unwind": "$Project.Client" },
{
"$group": {
"_id": {
"company": "$company",
"year": "$year",
"domain": "$Project.Domain"
},
"headcount": { "$sum": "$Project.Client.size" }
}
}
])
-- aggregate by year and company all the project sizes
5. db.trial_test.aggregate([{$group:{_id:"$status","count":{$sum:1}}}])
- aggregate by status
6. db.flights.aggregate([{$group:{_id:"$Year","total airtime":{$sum:"$AirTime"}}}])
- the ID has been described
7. db.flights.aggregate([{$group:{_id:{"Carrier":"$UniqueCarrier"},"total airtime":{$sum:"$AirTime"}}}])
Match
=====
8. db.flights.aggregate([{$group:{_id:{"Carrier":"$UniqueCarrier"},"total airtime":{$sum:"$AirTime"}}},{$match: {"_id.Carrier":"DL"}}])
Projection
==========
9. db.flights.aggregate([{$group:{_id:{"Carrier":"$UniqueCarrier"},"total airtime":{$sum:"$AirTime"}}},{$project:{_id:0, "Flight":
{$toLower:"$_id.Carrier"},"Air Time":{$multiply:["$total airtime",60]}}},{$out:"FlightPlan"}])