224P: Big Data Management (Fall 2024)
Lectures: TuTh, 11:00-12:20pm, SSL 228, Prof. Chen Li
Lab: W, 2:00- 2:50p, ICS 174, TA Jiadong Bai
Final schedule: Tuesday, Dec 10, 2024, 10:30 a.m. - 12:30 p.m. Overflow room: Social Ecology II (SE2) 1304. Please read the following post with instructions about the final: https://edstem.org/us/courses/67736/discussion/5812789 . We will finish all the grading (including the homework and final) by the end of Sunday, Dec. 15. You will use Monday, Dec. 16 to submit your regrade requests, after which we will finalize all the grades.
| Weekly Office Hours | Email (use Ed first) | |
|
Instructor: Prof. Chen Li |
Tuesdays 2 - 3 pm, DBH 2086 Dec. 9 (Monday) from 10 - noon |
chenli@ics.uci.edu |
|
TA: Jiadong Bai |
Monday 10 - 11AM, ICS 458A | jiadongb@uci.edu |
|
TA: Xiaozhen Liu |
Fridays 10 - 11 AM, ICS 458A |
xiaozl3@ics.uci.edu |
Course Overview
We will focus on big data systems, as well as relational and non-relational database technologies, including document (“NoSQL”) databases and emerging cloud data management solutions.
Lectures (subject to change)
| Lecture | Notes | Date | Topic |
| 01 | PPTX, PDF | Th 09/26/24 | Course overview, HW1, data history, big data overview |
| 02 | PPTX, PDF | Tu 10/01/24 | Big data overview; Relational DBMS and principles |
| 03 | PPTX, PDF | Th 10/03/24 | DBMS Principles (continued), Parallel DBMS (skipped) |
| 04 (Video) | PPTX, PDF | Tu 10/08/24 | NoSQL Column Family Stores, Apache Cassandra |
| 05 (Video) | PPTX, PDF | Th 10/10/24 | Apache Cassandra (2) |
| 06 | PPTX, PDF | Tu 10/15/24 | Key-value stores and consistencies |
| 07 | PPTX, PDF | Th 10/17/24 | Consistency in Cassandra |
| 08 | PPTX, PDF, lecture08-mongo-examples.ipynb) | Tu 10/22/24 | Json and MongoDB |
| 09 (Video) | PPTX, PDF, lecture09-mongo-examples.ipynb | Th 10/24/24 | MongoDB (2) |
| 10 (Video) | Ditto | Extra | MongoDB (3) |
| 11 | PPTX, PDF | Tu 10/29/24 | GraphDB and Neo4j |
| 12 | PPTX, PDF | Tu, 11/5/24 | Neo4j (2) |
| 13 | PPTX, PDF | Th, 11/7/24 | Neo4j (3) |
| 14 | PPTX, PDF | Tu, 11/12/24 | HDFS and MapReduce |
| 15 | PPTX, PDF, lecture15-SparkNotebook.ipynb, Data files | Th, 11/14/24 | Apache Spark (1) |
| 16 | Ditto (guest lecture by TA Xiaozhen Liu) | Tu, 11/19/24 | Spark (2) |
| 17 (Video) | PPTX, PDF | Th, 11/21/24 | Spark (3), Apache Flink |
| 18 | PPTX, PDF, lecture18-flink-notebook | Tu, 11/26/24 | Flink (2) |
| Th, 11/28/24 | Thanksgiving, no class | ||
| 19 | PPTX, PDF | Tu, 12/03/24 | Parallel DBMS (from Lecture 03), Search |
| 20 | Ditto | Th, 12/05/24 | Search (2), course review, wrap up. |
Discussion(Lab) Session
| Slide | Date | Topic |
| 01 | Wed 10/02/24 | PostgreSQL Practice |
| 02 |
Wed 10/09/24 |
Cassandra Concepts |
| 03 |
Wed 10/16/24 |
Key-Value Store & CAP |
| 04 |
Wed 10/23/24 |
JSON and MongoDB |
| 05 |
Wed 10/30/24 |
Neo4j |
| 06 |
Wed 11/06/24 |
Neo4j(2) |
| 07 |
Wed 11/13/24 |
Hadoop & MapReduce |
| 08 |
Wed 11/20/24 |
Spark |
| 09(No slides) |
Wed 11/27/24 |
Flink |
| 10 |
Wed 12/05/24 |
Reviews |
Homework (subject to change)
Description of the domain use case for HW1-HW6: ZotMusic Vision.pdf
| HW | Deadline | # of days | Topic | Setup Info | Details | Solutions |
| 1 | Mon, 10/7/24, 11:45 pm | 12 | Relational DBMS (PostgreSQL) | PostgreSQL Setup Instructions.pdf | ||
| 2 | Sat, 10/19/24 11:45 pm | 12 | Cassandra | |||
| 3 |
Thur, 10/31/24 |
12 | MongoDB | |||
| 4 |
Tue, 11/12/24 |
12 | Neo4J | HW4 Details.pdfHW #4 Template.docx | HW4 Solution.pdf | |
| 5 |
Extended deadline (Tuesday, 11/26/24) |
14 | Spark | HW5_solution.html | ||
| 6 |
Sun, 12/08/24 |
12 | Flink |
|
Online Discussion
We are using Ed Discussion for course discussion.
- Please use Ed properly. It's a place for students to exchange ideas. Don't post easy or random questions without much thinking.
- To encourage students to participate in Ed discussions and provide high-quality answers actively, we will select 2 students with the best Ed performance. These students will get 2% extra credit in the overall scores.
Use Ed Instead of Email
Please email the staff only if your question is personal and confidential. Most questions can be asked on Ed. Make it public if you think it can benefit the entire class. If you want to avoid the class seeing it, make it private and visible to all the instructors so that the staff members can see it and give consistent answers.
Prerequisites
You should have taken CS 220P ("Databases and Data Management") or an equivalent course.
Grade Book
All the homework should be submitted via Gradescope. Your grades will be returned through GradeScope (for regrades) and finally imported into Canvas.
Grading Breakdown
Homework: 56%
Lab attendance: 4%
Final: 40%
If you disagree with the grading for all the graded projects and exams, you can discuss them with us within one week after they are returned. After that, all the grades will be finalized.
Homework Late Policy
- The official due date for each assignment is listed here on this page; students are expected to turn the work in on or before that date.
- We will offer a 24-hour grace period for each assignment and accept submissions turned in within 24 hours of the due date, with a 10-point penalty. It's 10 points, not 10 percent. For example, if your late project got 87 points, your real score will be 87-10=77 points.
- Late assignments after the grace period will NOT be accepted beyond the grace period, so always aim to be on time! Please don't even ask, as this is what the 24-hour grace period is intended for.
Policy on Academic Honesty
- All students will be expected to adhere to the UCI and ICS Academic Honesty policies (see https://conduct.uci.edu/students/academic-integrity/index.php for details). Any student found to be involved in cheating or aiding others in doing so will be academically prosecuted to the maximum extent possible: that means you will fail this course. Just say no to cheating!
- In case you reuse another party's source code for certain generic tasks make sure you explicitly comment on its origin in your source code.