M101J: MongoDB for Java Developers Final: Question 1

Step 1:
download the Enron email dataset

Step 2:
extract and from command prompt type
mongorestore --host --port 27017  messages.bson

Step 3:
Check the data that has been imported.
1) db.enron.messages.find().count()  it should be 120,477 documents after restore.

2) db.messages.find({"headers.From":"", "headers.To": ""}).count() will result in 1.
this will ensure you have correct data to work on

type below query:
db.messages.find({"headers.From":"", "headers.To": ""}).count()

you will get your answer as 3

M101J: MongoDB for Java Developers Homework 5.4

Answer is 298015

M101J: MongoDB for Java Developers Homework 5.3

Answer is 1

M101J: MongoDB for Java Developers Homework 5.2

db.zips.aggregate([ { $group:{ "_id":{ "state":"$state", "city":"$city" }, "pop":{ $sum:"$pop" } } }, { $match:{ "_id.state":{ $in:[ "CA", "NY" ] }, "pop":{ $gt:25000 } } }, { $group:{ "_id":null, "pop":{ $avg:"$pop" } } } ])

Answer is 44805

M101J: MongoDB for Java Developers Homework 5.1

Question: Finding the most frequent author of comments on your blog.


you need to use webshell to find the most frequent author of comments 

Step 1:

Understand Structure of posts collection

    "_id" : ObjectId("540d427e132c1f13547188cc"),
    "body" : "empty_post",
    "permalink" : "cxzdzjkztkqraoqlgcru",
    "author" : "machine",
    "title" : "US Constitution",
    "tags" : [
        "saudi arabia",
    "comments" : [
            "body" : "empty_comment",
            "email" : "",
            "author" : "Kayce Kenyon"


2) we need to count comments so we will unwind the comments first using

        $unwind: "$comments"


3) then we need to group comment as per author so we will add group query with it count using sum.

$group: {
"_id": "$",
"num_comments": {
$sum: 1

 4) then we will sort from max to min number of comments. so we will add 

        $sort: {
            "num_comments": 1

5) note: this is large data so we will add limit to 1 rows by using
{$limit: 1}

so your final query will be

$project: {
"_id": 0,
"comments": 1
$unwind: "$comments"
$group: {
"_id": "$",
"num_comments": {
$sum: 1
$sort: {
"num_comments": 1
$limit: 1


M101J: MongoDB for Java Developers Homework 4.4

Step 1:
Download the handout

Step 2:
import the Sysprofile data using
mongoimport -d m101 -c profile < sysprofile.json

Step 3:
you need to look into data or write a query to find the maximum latency in milli second

so we will filter and apply max on "millis" key of document.

Answer was: 15820

M101J: MongoDB for Java Developers Homework 4.3

Step 1: Download Handout 

Step 2:
Problem Statement: we need to make blog fast by adding index to post collection.

Before that you need to import post.json

Steps to import:
from mongo shell 
>use blog
from the terminal window, you can go to uncompress directory of homework
and try below query to import.
mongoimport -d blog -c posts < posts.json

now, we need to improve performance for
1) Blogs Home page.
2)The page that displays blog posts by tag (http://localhost:8082/tag/whatever)
3) The page that displays a blog entry by permalink (http://localhost:8082/post/permalink)

1) For Query DBCursor cursor = postsCollection.find().sort(new BasicDBObject().append("date", -1)).limit(limit);

You can add
db.posts.ensureIndex({ date: -1})

2) for         
DBObject post = postsCollection.findOne(new BasicDBObject("permalink", permalink));

you can add index on 
db.posts.ensureIndex({ permalink: 1}, {unique: true})
 BasicDBObject query = new BasicDBObject("tags", tag);
   System.out.println("/tag query: " + query.toString());

DBCursor cursor = postsCollection.find(query).sort(new BasicDBObject().append("date", -1)).limit(10);

you can add index on 

 db.posts.ensureIndex({ tags: 1})

and then submit you answer in mongo proc.