B. Problem Description – MonashMRDB
MonR is Monash University’s restaurant management team that manages all restaurants on all Monash campuses. In recent times, due to the popularity of Mexican Food, MonR has decided to analyze Mexican Food related data. To maintain authenticity, MonR has acquired some sample data from about Mexican restaurants (or places) and their Customers (or users) from the originating place, Mexico.
The sample data, consists of: ● userProfile.json
Note: These data are raw data and do not follow any particular schema.
For the data management, MonR has hired your team of Advanced Database Experts to perform the following tasks.
The assignment is divided into FOUR main tasks:
C.1. Creating the Databases and Importing Data.
Since MonR has heard about both MongoDB and Cassandra, therefore, they wish to use a combination of both technologies to analyse the data. They prefer the profile information related data for both users and places to be stored in MongoDB, and the reviews related data to be stored in Cassandra.
MongoDB (MongoDB Compass and Mongo Shell):
- Using the Mongo Shell and appropriate Mongo Shell command(s), create a database called:
- Using the Mongo Shell and appropriate Mongo Shell command(s), create the following 2 collections:
- Using MongoDB Compass and appropriate MongoDB Data Types, import the data in
- json into the userProfiles collection.
- json into the placeProfiles collection.
- In MongoDB, we understand that there are two data modelling methods, which are embedding and referencing .
- In your own words, explain what you understand by embedding and referencing data models.
- Import the data in csv into your FIT5137A1MRDB database using appropriate data models.
- Include in your report, an explanation behind the selection of the data model.
Cassandra (Cassandra Shell):
- Create a keyspace called FIT5137A1_MRDB for the Cassandra database, with SimpleStrategy and replication factor of 1.
- Create the following column families using appropriate data types :
- Using the cassandra COPY command import the following data into the tables:
- csv into the user_ratings table.
- csv into the place_ratings table.
- For optimization purposes, you must not use any loops and/or additional variables (unless specified in the requirements) while performing this task.
- You must also take into account that more data may be added in the future.
- If required, you may create additional collection, data types, and/or tables. However, please ensure that you indicate those additional collection/data types/tables in your report.
❏ Task C.1.1. and C.1.2. MongoDB code and screenshots of results.
❏ Task C.1.3. screenshot from MongoDB Compass of at least one entire document after importing.
❏ Task C.1.4.
❏ C.1.4.a. and C.1.4.c. required explanations.
❏ C.1.4.b. screenshot(s) from MongoDB Compass of at least one entire document after importing both data files.
❏ Task C.1.5., Task C.1.6. and Task C.1.7. Cassandra commands and screenshots of results.
C.2. Modifying the Databases.
This task requires you to modify the appropriate database(s) for each question using Mongo
Shell and Cassandra Shell. As part of the task you are required to identify which database (MongoDB or Cassandra) is required to be modified.
- MonR has gained some new information about a trendy new place. Therefore, without creating any new fields, insert all of the information provided in Table 1.
|Table 1 : New place information|
|place name||Taco Jacks|
|street address||Carretera Central Sn|
|city||San Luis Potosi|
|smoking area||not permitted|
|accepted payment modes||any|
|opening hours||Mon;Tue;Wed;Thu;Fri; 09:00-20:00|
|Sat; Sun; 12:00-18:00|
- They have also realised that the user with user_id 1108, no longer prefers Fast_Food and also prefers to pay using debit_cards instead of cash. Therefore, without looking up the existing values or adding any new fields, update user 1108’s favorite cuisines and favorite payment methods.
- The management has realised that the user with user_id 1063 was an error. Therefore delete the user 1063 from the database.
- To be consistent with the changes made in Task C.2 (1), (2), and (3), the management has also requested to update the reviews information to reflect the changes made to the users 1108 and remove the user 1063’s reviews. They have looked up the data in the reviews table and have provided the information in table 2.
|Table 2 : User ratings information|
|user id||review id for reviews made by user|
|1063||137, 138, 139, 140, 141|
|1108||65, 66, 67, 68, 69, 70, 71, 72, 73, 74|
- It was also seen that user 1060 has reviewed Taco Jacks (ie. the new place with place id 70000), therefore using the information from table 3, insert the following data: (for this insert only you may look up the details of user 1060).
|Table 3 : User 1060’s rating for place 70000|
- Create the necessary modifications given in Task C.2. in the appropriate databases.
- MongoDB Compass must not be used for Task C.2.
- Ensure that data which is not required to be modified is not modified in any way.
❏ Mongo Shell and Cassandra Shell code and screenshots of results for all questions in Task C2. To show that your database modifications were successful you may create queries and display the before and after changes.
C.3. Querying the Management.
This task requires you to query the appropriate database(s) for each question. As part of the task you are required to identify which database (MongoDB or Cassandra) is required to be modified.
Note: if required you may create indexes however, you must provide appropriate reasoning where any index/indices are created.
- How many users are there in the database?
- How many places are there in the database?
- How many reviews were made in the database?
- How many reviews were created by places having public parking arrangements?
- Display the user id, rating of places and the user’s personality for all users who are interested in technology, thrifty protector types of workers, with blue as their favorite colour and are casual drinkers.
(Note that the number of rows returned should be greater than zero)
- What are the place ids and ratings for food for places serving only Pizzeria cuisine? (Note the result should include cuisine).
- Display all users who are students and prefer a medium budget restaurant.
- Display all users who like Bakery cuisines and combine your output with all places having Bakery cuisines.
- Display International restaurants that are open on Sunday.
- Display the average place rating, average food rating, and average service rating for puesto de tacos. Show the average values in decimal points.
- Display the average age according to each drinker level.
- For each user whose favourite cuisine is Family, display the place ID, the place rating, the food rating and the user’s budget.
- What are the top 3 most popular ambiences (friends/ family/ solitary) for a single when going to a Japanese restaurant?
- List the names of unique cuisines in the database.
- Display all of the restaurants and indicate using a separate field/column whether the restaurant includes mexican cuisines. For instance, you can display if the restaurant serves mexican cuisine then the result should show the restaurant name followed by “serves mexican food” in the next field/column, or if the restaurant does not serves mexican cuisine then the result should show the restaurant name followed by “doesn’t serves mexican food” in the next field/column.
You are required to provide five additional queries that you consider to be useful to the MonR management team’s operation.
- Create at least two indices including compound indexes for queries that are frequently used and Justify why you have chosen the fields to be the index.
- Provide the necessary appropriate read operation code for each query. For your answer you may follow where needed the following format similar to that shown in the tutorials.
- When creating each query you have to take into consideration the efficiency of the query operation.
❏ Mongo Shell Aggregation and Cassandra Shell code and screenshots of results for all questions in Task C.3.1. to C.3.15 .
❏ The explanation for the five additional queries .
C.4. Summary Reports.
As a summary you are required to provide a comparison between the three databases i.e.
Relational Database, Document-Oriented Database and Column-Oriented Database
- In your own words provide an explanation on how each type of database works.
- Provide a comparison in a tabular format with details on the main strengths and weaknesses of each database.
- Include a real-world example where each database is used.
MonR is happy with your work. However, they have to cut down their budget and they have decided to use only one database instead of two.
- Decide which database to use (either MongoDB or Cassandra).
- Provide a list of the steps (including an explanation and a flowchart) on how the merging process could be implemented.
- Provide a valid explanation on why you have selected one database over the other.
Note: You are NOT required to create the merged database, but your explanations must be valid for MonR.
❏ A report specifying the requirements, flow chart(s) and any references made from any sources. Note that this section does not require any code submissions.
C.5. Connecting to Drivers (Optional Bonus Section – up to 10 marks)
MonR has come to know that both MongoDB and Cassandra are NoSQL databases that can be incorporated into applications and be used as backend database stores. In order to incorporate MongoDB and Cassandra databases into applications (e.g. a Python application) we can use drivers. For details about how to use the MongoDB and Cassandra drivers you can refer to the steps given in the driver tutorials (under Week 8 Tutorial Resources).
Therefore, as a bonus requirement MonR has requested you to create a runnable driver script incorporating the code from Task C1, C2 and C3. You may separate the code into two driver script files, i.e. one for MongoDB and one for Cassandra.
- Provide in your report a list of steps you have taken to connect the MongoDB and Cassandra drivers
- Provide two driver script files (i.e. one for MongoDB and another for Cassandra)
Note: This is a bonus section and therefore the marks awarded in this section are additional to what receive for the Task C1, C2, C3 and C4. If you choose not to attempt this section then you can still receive full marks based on Task C1, C2, C3 and C4. However, to attain full marks for Task C5, both of your scripts must be runnable in the MacOS/Windows terminal/command prompt.
❏ List of steps to take in order to connect MongoDB and Cassandra drivers.
❏ A script for MongoDB driver and another script for Cassandra driver.