Monday, November 10, 2014

MongoDB CRUD performance - Insert Operation

How to insert a document in mongodb ?
  1. Simple Insert - One Document at a Time
  2. Bulk Insert - Multiple Document at a Time.
Simple Insert :
db.collection.insert(<bson document>);

eg: db.collection.insert({"key":1000});

Java Example:
collection.insert(new BasicDBObject("key", 1000));

Bulk Insert:
db.collection.insert(<bson document 1>
<bson document 2>
)

eg:

db.collection.insert({"key":1000},{"key",2000});

Java Example:
collection.insert(new BasicDBObject("key", 1000),new BasicDBObject("key",2000));


MongoDB insert Performance using Java:

Code:
import java.net.UnknownHostException;
import java.util.ArrayList;
import java.util.List;

import com.mongodb.BasicDBObject;
import com.mongodb.DB;
import com.mongodb.DBCollection;
import com.mongodb.DBObject;
import com.mongodb.MongoClient;

public class MongoInsertPerformance {
static int size = 10000;

public static void main(String[] args) throws UnknownHostException {

MongoClient mongoClient = new MongoClient("127.0.0.1");
DB db = mongoClient.getDB("mydb");
DBCollection collection = db.getCollection("test");
long startTime = System.currentTimeMillis();
for (int i = 0; i < size; i++) {
collection.insert(new BasicDBObject("" + i, "" + i));
}
long endTime = System.currentTimeMillis();
System.out.println("Total Time taken in insert is "
+ (endTime - startTime));

// Bulk insert
collection.drop();
collection = db.getCollection("test");
startTime = System.currentTimeMillis();
List<DBObject> bulkInsertList = new ArrayList<DBObject>();
for (int i = 0; i < size; i++) {
bulkInsertList.add(new BasicDBObject("" + i, "" + i));
}
collection.insert(bulkInsertList);
endTime = System.currentTimeMillis();
System.out.println("Total Time taken in insert is "
+ (endTime - startTime));
}

}


Output with 10K :
Total Time taken in simple insert is 2019

Total Time taken in bulk insert is 192


Output with 100K :
Total Time taken in simple insert is 13661
Total Time taken in bulk insert is 1197









Friday, September 19, 2014

M101J: MongoDB for Java Developers Final: Question 1

M101J: MongoDB for Java Developers Final: Question 1
Step 1:
download the Enron email dataset enron.zip

Step 2:
extract enron.zip and from command prompt type
mongorestore --host 192.168.50.4 --port 27017  messages.bson

Step 3:
Check the data that has been imported.
using
1) db.enron.messages.find().count()  it should be 120,477 documents after restore.

2) db.messages.find({"headers.From":"andrew.fastow@enron.com", "headers.To": "john.lavorato@enron.com"}).count() will result in 1.
this will ensure you have correct data to work on

Solution:
type below query:
db.messages.find({"headers.From":"andrew.fastow@enron.com", "headers.To": "jeff.skilling@enron.com"}).count()

you will get your answer as 3




Sunday, September 7, 2014

M101J: MongoDB for Java Developers Homework 5.4

M101J: MongoDB for Java Developers Homework 5.4

Answer is 298015

M101J: MongoDB for Java Developers Homework 5.3

M101J: MongoDB for Java Developers Homework 5.3

Answer is 1

M101J: MongoDB for Java Developers Homework 5.2

M101J: MongoDB for Java Developers Homework 5.2


Query:
db.zips.aggregate([ { $group:{ "_id":{ "state":"$state", "city":"$city" }, "pop":{ $sum:"$pop" } } }, { $match:{ "_id.state":{ $in:[ "CA", "NY" ] }, "pop":{ $gt:25000 } } }, { $group:{ "_id":null, "pop":{ $avg:"$pop" } } } ])


Answer is 44805

M101J: MongoDB for Java Developers Homework 5.1

M101J: MongoDB for Java Developers Homework 5.1

Question: Finding the most frequent author of comments on your blog.

 Solution:

you need to use webshell to find the most frequent author of comments 

Step 1:

Understand Structure of posts collection

{
    "_id" : ObjectId("540d427e132c1f13547188cc"),
    "body" : "empty_post",
    "permalink" : "cxzdzjkztkqraoqlgcru",
    "author" : "machine",
    "title" : "US Constitution",
    "tags" : [
        "january",
        "mine",
        "modem",
        "literature",
        "saudi arabia",
        "rate",
        "package",
        "respect",
        "bike",
        "cheetah"
    ],
    "comments" : [
        {
            "body" : "empty_comment",
            "email" : "eAYtQPfz@kVZCJnev.com",
            "author" : "Kayce Kenyon"
        },.........

 

2) we need to count comments so we will unwind the comments first using

 {
        $unwind: "$comments"
    }

 

3) then we need to group comment as per author so we will add group query with it count using sum.

{
$group: {
"_id": "$comments.author",
"num_comments": {
$sum: 1
}
}
}

 4) then we will sort from max to min number of comments. so we will add 

{
        $sort: {
            "num_comments": 1
        }
    }


5) note: this is large data so we will add limit to 1 rows by using
{$limit: 1}

so your final query will be

db.posts.aggregate([
{
$project: {
"_id": 0,
"comments": 1
}
},
{
$unwind: "$comments"
},
{
$group: {
"_id": "$comments.author",
"num_comments": {
$sum: 1
}
}
},
{
$sort: {
"num_comments": 1
}
},
{
$limit: 1
}
]
};
AND ANSWER I GOT IS Gisela Levin

 

Monday, September 1, 2014

M101J: MongoDB for Java Developers Homework 4.4

M101J: MongoDB for Java Developers Homework 4.4

Step 1:
Download the handout

Step 2:
import the Sysprofile data using
mongoimport -d m101 -c profile < sysprofile.json

Step 3:
you need to look into data or write a query to find the maximum latency in milli second

so we will filter and apply max on "millis" key of document.


Answer was: 15820

M101J: MongoDB for Java Developers Homework 4.3

M101J: MongoDB for Java Developers Homework 4.3:

Step 1: Download Handout 

Step 2:
Problem Statement: we need to make blog fast by adding index to post collection.

Before that you need to import post.json

Steps to import:
from mongo shell 
>use blog
>db.posts.drop()
from the terminal window, you can go to uncompress directory of homework
and try below query to import.
mongoimport -d blog -c posts < posts.json


now, we need to improve performance for
1) Blogs Home page.
2)The page that displays blog posts by tag (http://localhost:8082/tag/whatever)
3) The page that displays a blog entry by permalink (http://localhost:8082/post/permalink)


1) For Query DBCursor cursor = postsCollection.find().sort(new BasicDBObject().append("date", -1)).limit(limit);
 

You can add
db.posts.ensureIndex({ date: -1})

2) for         
DBObject post = postsCollection.findOne(new BasicDBObject("permalink", permalink));

you can add index on 
db.posts.ensureIndex({ permalink: 1}, {unique: true})
 BasicDBObject query = new BasicDBObject("tags", tag);
     
   System.out.println("/tag query: " + query.toString());
        

DBCursor cursor = postsCollection.find(query).sort(new BasicDBObject().append("date", -1)).limit(10);


you can add index on 

 db.posts.ensureIndex({ tags: 1})

and then submit you answer in mongo proc.

Thursday, August 28, 2014

M101J: MongoDB for Java Developers Homework 4.2

M101J: MongoDB for Java Developers Homework 4.2
Question:
Suppose you have a collection called tweets whose documents contain information about the created_at time of the tweet and the user's followers_count at the time they issued the tweet. What can you infer from the following explain output?
db.tweets.find({"user.followers_count":{$gt:1000}}).sort({"created_at" : 1 })
.limit(10).skip(5000).explain()
{
        "cursor" : "BtreeCursor created_at_-1 reverse",
        "isMultiKey" : false,
        "n" : 10,
        "nscannedObjects" : 46462,
        "nscanned" : 46462,
        "nscannedObjectsAllPlans" : 49763,
        "nscannedAllPlans" : 49763,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 205,
        "indexBounds" : {
                "created_at" : [
                        [
                                {
                                        "$minElement" : 1
                                },
                                {
                                        "$maxElement" : 1
                                }
                        ]
                ]
        },
        "server" : "localhost.localdomain:27017"
}
 
Option:
 
1) This query performs a collection scan.  
True, it perform a collection scan.
 
2)The query uses an index to determine the order in which to return result documents. 
True, it uses an index to determine result
 
3)The query uses an index to determine which documents match.
false, for matches there is no index, index are use only for ordering.
 
4) The query returns 46462 documents. 
false, You can see the query it has limit to 10 , so it return only 10 result.
 
5)The query visits 46462 documents. 
true,  yes it has scanned 46462 document.
 
6)The query is a "covered index query". 
false, query is colvered index query.
 
So answer for week 4 - Homework 4.2 are 
 
 
  
   
  

M101J: MongoDB for Java Developers Homework 4.1

M101J: MongoDB for Java Developers Homework 4.1

This HomeWork is related to find which query will be using index to get result.

you can use query to find list of index


db.system.indexes.find()

Document:
{
  "v" : 1,
  "key" : {
   "sku" : 1
  },
                "unique" : true,
  "ns" : "store.products",
  "name" : "sku_1"
 }
 
Key Indicates there exist an index sku on collection products with index name as
 sku_1.[at the time of index creation if we give {sku:-1} name of index will be
 sku_-1]
 
db.collectionname.getIndexes() can be used to get index in individual collection


Now lets complete the homework.
Which of the following queries can utilize an index. Check all that apply.
1) db.products.find({'brand':"GE"}) 
if you check in document , there is no index with brand, you will find brand
index as multi index with category. but in query we are only using brand so above
query cannot utilize and index.
 
2)db.products.find({'brand':"GE"}).sort({price:1}) 
if you check in document, you will see brand cannot use an index, but in document
if you see price is present as single index. so the above query use an index.
 
3)db.products.find({$and:[{price:{$gt:30}},{price:{$lt:50}}]}).sort({brand:1}) 
 price is present, therefore query is using index.
 
4) db.products.find({brand:'GE'}).sort({category:1, brand:-1}).explain() 
 category and brand are present in document as multi - index  but if you check
index is created with {category:1, brand:1} but while using in query it is using 
{category:1, brand:-1}, so for sorting mongo will perform complete scan and
 it will not use index . if index was created with {category:1, brand:-1} then
the above query will be using an index.
 
So your homework 4.1 answer is 
 
 


  
    
 

M101J: MongoDB for Java Developers - Week 4

M101J: MongoDB for Java Developers - Week 4

Mongodb week 4 is related with peformance, you will learn
1) what is index in mongodb and how will you  create it?
2) Identify slow query.
3) Overview of Sharding.



Wednesday, August 20, 2014

M101J: MongoDB for Java Developers homework 3.3

Homework 3:
Problem you should be able to insert comment in the blog post.

Solution:
You dont have to do anything. changes were made in homework 3.2 so click turn in button in mongoproc and submit.

so these complete your homework for week 3

M101J: MongoDB for Java Developers Homework 3.2

Step 1:
Download Handout

Step 2:
Problem:
we need to enhance the project and we should be able to insert the entries in post collection. changes that we need to made are marked as XXX in BlogPostDAO
.

Solution:
open BlogPostDAO.java
 
1) Go to findByPermalink method
add below line at XXX
 
post = postsCollection.findOne(new BasicDBObject("permalink", permalink));
 
 2) Go to findByDateDescending method
add below line at XXX
 
DBCursor cursor = postsCollection.find()
 .sort(new BasicDBObject().append("date", -1)).limit(limit); 
 
    //DBCursor cursor1 = postsCollection.find();
        posts = new LinkedList<DBObject>();
        for (DBObject value : cursor) {
            posts.add(value);
        }
        System.out.println("Value from the DB" + posts);

3) Go to addPost method
add below Line at XXX
 
Date now = new Date();
  List comments = new ArrayList<Object>();
  post.append("author", username).append("title", title)
    .append("body", body).append("permalink", permalink)
    .append("tags", tags).append("date", now)
    .append("comments", comments);
  postsCollection.insert(post);
4) Go to addPostComment method
add below Line at XXX
  BasicDBObject comment = new BasicDBObject().append("author", name)
    .append("body", body);
  if (email != null) {
   comment.append("email", email);
  }
  postsCollection.update(new BasicDBObject("permalink", permalink),
    new BasicDBObject("$push", new BasicDBObject("comments",
      comment)), true, false);



you require only above changes.

now re compile using mvn compile exec:java -Dexec.mainClass=course.BlogController
or run the run.sh file

login to http://localhost:8082/login

you should be able to insert post in the blog.
 
 
MongoProc 

1) now go to mongoproc and login with course ID 
 
2)go to your homework and Click on Test.

3)if you receive below feedback then everything is correct. you should click on turn in
and you score will be submitted. 







Feedback Welcome. enjoy you homework.

 
 

 


Tuesday, August 19, 2014

M101J: MongoDB for Java Developers Homework 3.1

Step 1:
Download students.json from download handout

Step 2:
Import the same using mongo import
mongoimport -d school -c students < students.json

Step 3:
Write a program in your language to delete that will remove the lowest homework score for each student.

Step 4:
Cross check your data

Check 1:
Go to mongo shell
type
> use school
> db.students.count();
200


Check 2:
> db.students.find({_id:100}).pretty();
{
        "_id" : 100,
        "name" : "Demarcus Audette",
        "scores" : [
                {
                        "type" : "exam",
                        "score" : 47.42608580155614
                },
                {
                        "type" : "quiz",
                        "score" : 44.83416623719906
                },
                {
                        "type" : "homework",
                        "score" : 19.85604968544429
                },
                {
                        "type" : "homework",
                        "score" : 39.01726616178844
                }
        ]
}


Check 3:
Aggregation Query for student with highest average

db.students.aggregate([{'$unwind':'$scores'},{'$group':{'_id':'$_id','average':{$avg:'$scores.score'}}},{'$sort':{'average':-1}},{'$limit':1}])


Next homework will do it tomorrow.




Monday, August 18, 2014

M101J: MongoDB for Java Developers Homework 2.3

Homework 2.3 (MongoProc)

1) Step 1:
 you need to setup mongoproc.
Please read Instrcution for mongoproc setup.

2) Step 2:
Download  hw2.3.zip from download handout link

3) Step 3:
you can start script
run.sh
and visit the link http://localhost:8082 

4) Problem:
Problem is when you create a user, you will be redirected to welcome page
and when you logout you will not be able to login again.
goal of this homework is to complete the code in UserDAO.java file.

wherever there is xxx in code we will add java statement.

5) Solution:

open UserDAO.java

in addUser method

add following code in first // XXX WORK HERE
BasicDBObject user = new BasicDBObject("_id", username).append(
"password", passwordHash);

then add in 2nd // XXX WORK HERE
user.append("email", email);

then add in 3rd //XXX work here
usersCollection.insert(user); 

then in validateLogin method
add in 1st //XXX work here
user = usersCollection.findOne(new BasicDBObject("_id", username));


that it. you have change the code, now re - run run.sh

now you should be able to relogin in page. if it is successful

open mongoproc

click on homework 2.3(mongoproc). click on test. you will get feedback 

if it is sucessfull. click on turn in. your homework is submitted.
Enjoy :)

Any feedback welcome.


 

 

 

 

Thursday, August 14, 2014

M101J: MongoDB for Java Developers : Homework 2.2

Homework 2.2
Write a program in the language of your choice that will remove the grade of type "homework" with the lowest score for each student from the dataset that you imported in HW 2.1. Since each document is one grade, it should remove one document per student.

Solution:
the question say you need to write  your program to remove document of grade homework and lowest score for each student.

i will use java to solve above homework.

you have also got hint
Hint/spoiler: If you select homework grade-documents, sort by student and then by score, you can iterate through and find the lowest score for each student by noticing a change in student id. As you notice that change of student_id, remove the document.


First let us understand what we have to remove
1) hints say you need to find document with {"type":"homework"}  and then you need to sort by student_id in ascending order {"student_id":1} and then by score in ascending order {"score":1}

so your final query is
db.grades.find({"type":"homework"}).sort({"student_id":1},{"score":1})

so you will get below output  after executing above query



now the question say you need to remove each one document per student having lowest score and told us track on student_id. in the above result it is clear we need to remove first document for each student.

now let write a java program :

HomeWork.java

import java.net.UnknownHostException;

import com.mongodb.BasicDBObject;
import com.mongodb.DB;
import com.mongodb.DBCollection;
import com.mongodb.DBCursor;
import com.mongodb.MongoClient;

public class HomeWork {
    public static void main(String[] args) throws UnknownHostException {
        MongoClient client = new MongoClient("127.0.0.1");
        DB db = client.getDB("students");
        DBCollection collection = db.getCollection("grades");
        BasicDBObject searchQuery = new BasicDBObject();
        BasicDBObject sortQuery = new BasicDBObject();
        searchQuery.put("type", "homework");
        sortQuery.put("student_id", 1);
        sortQuery.put("score", 1);
        DBCursor cursor = collection.find(searchQuery).sort(sortQuery);
        System.out.println(cursor.count());
        BasicDBObject tempDocument = null;
        int i = 0;
        while (cursor.hasNext()) {
            BasicDBObject document = (BasicDBObject) cursor.next();
            System.out.println(document);
            if (tempDocument != null) {
                if (!document.get("student_id").equals(
                        tempDocument.get("student_id"))) {
                    collection.remove(document);
                    i++;
                }
            } else {
                collection.remove(document);
                i++;

            }
            tempDocument = document;

        }
        System.out.println(i);
    }
}


this will remove first document of all student.






you can cross check by

1) db.grades.count()  - 600
2) db.grades.find().sort({'score':-1}).skip(100).limit(1)
{ "_id" : ObjectId("50906d7fa3c412bb040eb709"), "student_id" : 100, "type" : "homework", "score" : 88.50425479139126 } 
3)db.grades.find({},{'student_id':1, 'type':1, 'score':1, '_id':0}).sort({'student_id':1, 'score':1, }).limit(5)
{ "student_id" : 0, "type" : "quiz", "score" : 31.95004496742112 }
{ "student_id" : 0, "type" : "exam", "score" : 54.6535436362647 }
{ "student_id" : 0, "type" : "homework", "score" : 63.98402553675503 }
{ "student_id" : 1, "type" : "homework", "score" : 44.31667452616328 }
{ "student_id" : 1, "type" : "exam", "score" : 74.20010837299897 }
 
 
if all 3 give you correct result then 
 
execute below query
db.grades.aggregate({'$group':{'_id':'$student_id', 'average':{$avg:'$score'}}}, {'$sort':{'average':-1}}, {'$limit':1})
 
you will get below result
{
        "result" : [
                {
                        "_id" : 54,
                        "average" : 96.19488173037341
                }
        ],
        "ok" : 1
}
 
 
 
So answer for homework 2.2 is 54