# MongoDB queries

We import PyMongo, a Python library containing tools for working with MongoDB.

The doc of the library can be found here: https://docs.mongodb.com.

In [70]:
import pymongo

Then, we connect to the MongoDB collection named 'businesses'.

In [71]:
mdb = pymongo.MongoClient()['businesses']
businesses = mdb['businesses']

Using ```find_one()``` we get one document that satisfies the specified query criteria on the collection. Notice here that the query argument is empty.

If multiple documents satisfy the query, this method returns the first document according to the natural order which reflects the order of documents on the disk.

In [72]:
example = businesses.find_one()

In [73]:
example

{'_id': ObjectId('5edf46601bca82ff74bb96ef'),
 'id': 'FYWN1wneV18bWNgQjJ2GNg',
 'city': 'Ahwatukee',
 'country': 'AZ',
 'stars': 4,
 'is_active': True,
 'categories': ['Dentists',
  'Health & Medical',
  'General Dentistry',
  'Oral Surgeons',
  'Orthodontists',
  'Cosmetic Dentists'],
 'reviews': [{'reviewer_id': 'MWxrPSz87wx559-Rg3YL-Q',
   'reviewer_name': 'Jessica',
   'yelp_since': datetime.datetime(2012, 1, 9, 0, 0),
   'review_date': datetime.datetime(2013, 3, 12, 0, 0),
   'review_score': 1},
  {'reviewer_id': 'O412lFp-8M8VpRwdzl0S0A',
   'reviewer_name': 'Christina',
   'yelp_since': datetime.datetime(2014, 4, 21, 0, 0),
   'review_date': datetime.datetime(2014, 4, 21, 0, 0),
   'review_score': 5},
  {'reviewer_id': '4ZR5wwn8NST5aqCa3ueTFg',
   'reviewer_name': 'F.',
   'yelp_since': datetime.datetime(2012, 11, 19, 0, 0),
   'review_date': datetime.datetime(2014, 6, 28, 0, 0),
   'review_score': 1},
  {'reviewer_id': '9G8nsgZ-7o3an9HN4N08Hw',
   'reviewer_name': 'Lisa',
   'ye

### Q1: businesses in the Arizona (AZ) country with 5 stars

In this first example, we want to find all the businesses in the AZ country which received a review with 5 stars.

We will use ```find()```, which selects documents in a collection and returns a cursor to the selected documents.

We will also set the arguments query and projection. The first one, specifies the selection filter using query operators. The second one, specifies the fields to return in the documents that match the query filter.

Recall that: 
- A projection can explicitly include several fields by setting them to 1.
- You can remove the ```_id``` field from the results by setting it to 0 in the projection.

In [93]:
q = {"country" : "AZ", "stars": 5}
p = {'_id' : 0, 'country' : 1, 'city' : 1, 'stars' : 1}
cursor = businesses.find(q, p)
for record in cursor.limit(20):
    print(record)

{'city': 'Chandler', 'country': 'AZ', 'stars': 5}
{'city': 'Goodyear', 'country': 'AZ', 'stars': 5}
{'city': 'Phoenix', 'country': 'AZ', 'stars': 5}
{'city': 'Cave Creek', 'country': 'AZ', 'stars': 5}
{'city': 'Phoenix', 'country': 'AZ', 'stars': 5}
{'city': 'Mesa', 'country': 'AZ', 'stars': 5}
{'city': 'Phoenix', 'country': 'AZ', 'stars': 5}
{'city': 'Peoria', 'country': 'AZ', 'stars': 5}
{'city': 'Phoenix', 'country': 'AZ', 'stars': 5}


### Q2: businesses that have at least five reviews

In this query we want to find the businesses that have at least five reviews. We can address the query using the aggregation framework. MongoDB aggregation framework is modeled on the concept of data processing pipelines: documents enter a multi-stage pipeline that transforms the documents into a final aggregated result. Using the aggregation framework we will:
- ``unwind`` on the reviews field. Recall that the unwind command deconstructs an array field from the input documents to output a document for each element. Each output document is the input document with the value of the array field replaced by the element.
- ``group`` on the _id field. The command groups input documents by the specified _id expression. When grouping we will compute the sum of the elements with the same _id.
- ``match`` on the computed number of reviews. The command filters the documents to pass only the documents that match the specified condition(s) to the next pipeline stage.

In [99]:
cursor = businesses.aggregate([{ "$unwind": "$reviews" },
                               { "$group": { "_id": "$_id", "Number of reviews": { "$sum": 1 } } },
                               { "$match": { "Number of reviews": { "$gt": 5 } } }
                              ]);
for record in cursor:
    print(record)

{'_id': ObjectId('5edf470b1bca82ff74bb9742'), 'Number of reviews': 24}
{'_id': ObjectId('5edf47151bca82ff74bb9748'), 'Number of reviews': 16}
{'_id': ObjectId('5edf476d1bca82ff74bb977c'), 'Number of reviews': 7}
{'_id': ObjectId('5edf46f81bca82ff74bb9738'), 'Number of reviews': 21}
{'_id': ObjectId('5edf47561bca82ff74bb976e'), 'Number of reviews': 7}
{'_id': ObjectId('5edf46841bca82ff74bb96fe'), 'Number of reviews': 12}
{'_id': ObjectId('5edf46601bca82ff74bb96ef'), 'Number of reviews': 23}
{'_id': ObjectId('5edf46c91bca82ff74bb971f'), 'Number of reviews': 25}
{'_id': ObjectId('5edf47001bca82ff74bb973c'), 'Number of reviews': 17}
{'_id': ObjectId('5edf46681bca82ff74bb96f3'), 'Number of reviews': 116}
{'_id': ObjectId('5edf47401bca82ff74bb9761'), 'Number of reviews': 26}
{'_id': ObjectId('5edf47881bca82ff74bb978c'), 'Number of reviews': 12}
{'_id': ObjectId('5edf47b81bca82ff74bb97a9'), 'Number of reviews': 8}
{'_id': ObjectId('5edf46c11bca82ff74bb971b'), 'Number of reviews': 25}
{'_id': 

### Q3: find the businesses that are categorized as "Restaurants" and "Arts & Entertainment" (both categories)

The ``$in`` operator selects the documents where the value of a field equals any value in the specified array.

``$and`` performs a logical AND operation on an array of one or more expressions (e.g. ``<expression1>``, ``<expression2>``, etc.) and selects the documents that satisfy all the expressions in the array.

In [105]:
q = {"$and": [{ "categories": { "$in": [ 'Restaurants' ]}},
              { "categories": { "$in": [ 'Arts & Entertainment']}}
             ]
    }

p = {"_id": 0, "id": 1, "categories": 1}

cursor = businesses.find(q, p);

for record in cursor:
    print(record)

{'id': '1_3nOM7s9WqnJWTNu2-i8Q', 'categories': ['Restaurants', 'French', 'Gastropubs', 'Festivals', 'Arts & Entertainment']}
{'id': 'n7V4cD-KqqE3OXk0irJTyA', 'categories': ['American (New)', 'Arcades', 'Restaurants', 'Arts & Entertainment', 'Gastropubs']}
{'id': 'M3uV9Y3EDSpy9d4YwyNSAQ', 'categories': ['Arts & Entertainment', 'Restaurants', 'Ramen', 'Japanese', 'Bars', 'Nightlife', 'Music Venues']}


### Q4: find the number of restaurants in Nevada

In this second query, we want to find the total number of restaurants in Nevada.

A first possible query is:

In [77]:
cursor = businesses.aggregate( [ 
	{ "$match": { "country": "NV", "categories": { "$in": [ 'Restaurants' ] } } },
   {"$group": {"_id": {"country": "$country"}, "Number of restaurants": {"$sum": 1 } } }
] );

for record in cursor:
    print(record)

{'_id': {'country': 'NV'}, 'Number of restaurants': 8}


Here it follows an alternative query:

In [109]:
businesses.find({
    "country": "NV",
    "categories": {"$in": ["Restaurants"]}
}
).count()

  This is separate from the ipykernel package so we can avoid doing imports until


8

Another one here below:

In [111]:
cursor = businesses.aggregate( [ 
	{ "$match": { "country": "NV" } },
	{ "$unwind": "$categories" },
	{ "$match": { "categories": "Restaurants" } },
   {"$group": {"_id": {"country": "$country"}, "Number of restaurants": {"$sum": 1 } } }
] );

for record in cursor:
    print(record)

{'_id': {'country': 'NV'}, 'Number of restaurants': 8}


### Q5: find the average stars per business

In [113]:
p = {'$project': {'_id': 0, 'id' : 1, 'reviewer_id' : '$reviews.reviewer_id', 'stars': '$reviews.review_score'}}

In [114]:
u = {'$unwind': '$reviews'}

In [117]:
l = {'$limit': 10}
cursor = businesses.aggregate([u, p, l])
for record in cursor:
    print(record)

{'id': 'FYWN1wneV18bWNgQjJ2GNg', 'reviewer_id': 'MWxrPSz87wx559-Rg3YL-Q', 'stars': 1}
{'id': 'FYWN1wneV18bWNgQjJ2GNg', 'reviewer_id': 'O412lFp-8M8VpRwdzl0S0A', 'stars': 5}
{'id': 'FYWN1wneV18bWNgQjJ2GNg', 'reviewer_id': '4ZR5wwn8NST5aqCa3ueTFg', 'stars': 1}
{'id': 'FYWN1wneV18bWNgQjJ2GNg', 'reviewer_id': '9G8nsgZ-7o3an9HN4N08Hw', 'stars': 1}
{'id': 'FYWN1wneV18bWNgQjJ2GNg', 'reviewer_id': 'KdXsniAzdczdlxi31ftbvA', 'stars': 1}
{'id': 'FYWN1wneV18bWNgQjJ2GNg', 'reviewer_id': 'ZeNWkf6fdzZWyat8gdGcqA', 'stars': 5}
{'id': 'FYWN1wneV18bWNgQjJ2GNg', 'reviewer_id': 'YGQv5HXLKu8X6K9yGgTQ7Q', 'stars': 1}
{'id': 'FYWN1wneV18bWNgQjJ2GNg', 'reviewer_id': '05QcvAw7bO4Lcm0bCNejiw', 'stars': 5}
{'id': 'FYWN1wneV18bWNgQjJ2GNg', 'reviewer_id': 'XGL7VDkeUyM5nKQspJBTNw', 'stars': 5}
{'id': 'FYWN1wneV18bWNgQjJ2GNg', 'reviewer_id': 'hf27xTME3EiCp6NL6VtWZQ', 'stars': 5}


Now we can group and sort:

In [88]:
g = {'$group': {'_id': '$id', 'score': {'$avg': '$stars'}, 'count': {'$sum': 1}}}
s = {'$sort': {'score': 1}}

In [89]:
cursor = businesses.aggregate([u, p, g, s])
for record in cursor:
    print(record)

{'_id': 'Kul8tFT48hZQJkeNK5jLBQ', 'score': 1.0, 'count': 10}
{'_id': 'rDMptJYWtnMhpQu_rRXHng', 'score': 1.0909090909090908, 'count': 11}
{'_id': 'oZBISmgb-GjOdj-ik8UfjA', 'score': 1.2857142857142858, 'count': 7}
{'_id': 'z0BQG6LJOmd8E7cNuMtH0A', 'score': 1.3333333333333333, 'count': 3}
{'_id': '7YIy1tXOor9VCwvaSjuBHg', 'score': 1.391304347826087, 'count': 46}
{'_id': 'OD2hnuuTJI9uotcKycxg1A', 'score': 1.4444444444444444, 'count': 9}
{'_id': 'F0fEKpTk7gAmuSFI0KW1eQ', 'score': 1.6666666666666667, 'count': 3}
{'_id': 'KQPW8lFf1y5BT2MxiSZ3QA', 'score': 1.6666666666666667, 'count': 18}
{'_id': 'NmZtoE3v8RdSJEczYbMT9g', 'score': 1.8, 'count': 5}
{'_id': 'GDOAg680Gmi6S1MlhU7B1g', 'score': 1.8, 'count': 5}
{'_id': 'A_Ij4SwFmlRbVtRnsdSzWA', 'score': 1.8, 'count': 5}
{'_id': 'aWnASLfWj1G6ptH4SR5RRA', 'score': 2.0, 'count': 3}
{'_id': 'Q1MzgzH263RgYX4TU4xQ2Q', 'score': 2.0, 'count': 4}
{'_id': 'c6Shr51XcbvAeXp6hb_Exg', 'score': 2.0, 'count': 18}
{'_id': '1nhf9BPXOBFBkbRkpsFaxA', 'score': 2.0, 'co

### Q6: find the countries where the number of restaurants are more than 10

In [84]:
cursor = businesses.aggregate( [ 
	{ "$match": { "categories": { "$in": [ 'Restaurants' ] } } },
   {"$group": {"_id": {"country": "$country"}, "Number of restaurants": {"$sum": 1 } } },
   { "$match": { "Number of restaurants": { "$gt": 10 } } }
] )

for record in cursor:
    print(record)

{'_id': {'country': 'ON'}, 'Number of restaurants': 17}
{'_id': {'country': 'AZ'}, 'Number of restaurants': 12}
