224P: Big Data Management (Fall 2024)
Lectures: TuTh, 11:00-12:20pm, SSL 228, Prof. Chen Li
Lab: W, 2:00- 2:50p, ICS 174, TA Jiadong Bai
Final schedule: Tuesday, Dec 10, 2024, 10:30 a.m. - 12:30 p.m. Overflow room: Social Ecology II (SE2) 1304. Please read the following post with instructions about the final: https://edstem.org/us/courses/67736/discussion/5812789 . We will finish all the grading (including the homework and final) by the end of Sunday, Dec. 15. You will use Monday, Dec. 16 to submit your regrade requests, after which we will finalize all the grades.
Weekly Office Hours | Email (use Ed first) | |
Instructor: Prof. Chen Li |
Tuesdays 2 - 3 pm, DBH 2086 Dec. 9 (Monday) from 10 - noon |
chenli@ics.uci.edu |
TA: Jiadong Bai |
Monday 10 - 11AM, ICS 458A | jiadongb@uci.edu |
Fridays 10 - 11 AM, ICS 458A |
xiaozl3@ics.uci.edu |
Course Overview
We will focus on big data systems, as well as relational and non-relational database technologies, including document (“NoSQL”) databases and emerging cloud data management solutions.
Lectures (subject to change)
Lecture | Notes | Date | Topic |
01 | PPTX Download PPTX, PDF Download PDF | Th 09/26/24 | Course overview, HW1, data history, big data overview |
02 | PPTX Download PPTX, PDF Download PDF | Tu 10/01/24 | Big data overview; Relational DBMS and principles |
03 | PPTX Download PPTX, PDF Download PDF | Th 10/03/24 | DBMS Principles (continued), Parallel DBMS (skipped) |
04 (Video Links to an external site.) | PPTX Download PPTX, PDF Download PDF | Tu 10/08/24 | NoSQL Column Family Stores, Apache Cassandra |
05 (Video Links to an external site.) | PPTX Download PPTX, PDF Download PDF | Th 10/10/24 | Apache Cassandra (2) |
06 | PPTX Download PPTX, PDF Download PDF | Tu 10/15/24 | Key-value stores and consistencies |
07 | PPTX Download PPTX, PDF Download PDF | Th 10/17/24 | Consistency in Cassandra |
08 | PPTX Download PPTX, PDF Download PDF, lecture08-mongo-examples.ipynb) Download lecture08-mongo-examples.ipynb) | Tu 10/22/24 | Json and MongoDB |
09 (Video Links to an external site.) | PPTX Download PPTX, PDF Download PDF, lecture09-mongo-examples.ipynb Download lecture09-mongo-examples.ipynb | Th 10/24/24 | MongoDB (2) |
10 (Video Links to an external site.) | Ditto | Extra | MongoDB (3) |
11 | PPTX Download PPTX, PDF Download PDF | Tu 10/29/24 | GraphDB and Neo4j |
12 | PPTX Download PPTX, PDF Download PDF | Tu, 11/5/24 | Neo4j (2) |
13 | PPTX Download PPTX, PDF Download PDF | Th, 11/7/24 | Neo4j (3) |
14 | PPTX Download PPTX, PDF Download PDF | Tu, 11/12/24 | HDFS and MapReduce |
15 | PPTX Download PPTX, PDF Download PDF, lecture15-SparkNotebook.ipynb Download lecture15-SparkNotebook.ipynb, Data files | Th, 11/14/24 | Apache Spark (1) |
16 | Ditto (guest lecture by TA Xiaozhen Liu) | Tu, 11/19/24 | Spark (2) |
17 (Video Links to an external site.) | PPTX Download PPTX, PDF Download PDF | Th, 11/21/24 | Spark (3), Apache Flink |
18 | PPTX Download PPTX, PDF Download PDF, lecture18-flink-notebook | Tu, 11/26/24 | Flink (2) |
Th, 11/28/24 | Thanksgiving, no class | ||
19 | PPTX Download PPTX, PDF Download PDF | Tu, 12/03/24 | Parallel DBMS (from Lecture 03), Search |
20 | Ditto | Th, 12/05/24 | Search (2), course review, wrap up. |
Discussion(Lab) Session
Slide | Date | Topic |
01 Links to an external site. | Wed 10/02/24 | PostgreSQL Practice |
02 Links to an external site. |
Wed 10/09/24 |
Cassandra Concepts |
03 Links to an external site. |
Wed 10/16/24 |
Key-Value Store & CAP |
04 Links to an external site. |
Wed 10/23/24 |
JSON and MongoDB |
05 Links to an external site. |
Wed 10/30/24 |
Neo4j |
06 Links to an external site. |
Wed 11/06/24 |
Neo4j(2) |
07 Links to an external site. |
Wed 11/13/24 |
Hadoop & MapReduce |
08 Links to an external site. |
Wed 11/20/24 |
Spark |
09(No slides) |
Wed 11/27/24 |
Flink |
10 Links to an external site. |
Wed 12/05/24 |
Reviews |
Homework (subject to change)
Description of the domain use case for HW1-HW6: ZotMusic Vision.pdf Download ZotMusic Vision.pdf
Online Discussion
We are using Ed Discussion Links to an external site. for course discussion.
- Please use Ed properly. It's a place for students to exchange ideas. Don't post easy or random questions without much thinking.
- To encourage students to participate in Ed discussions and provide high-quality answers actively, we will select 2 students with the best Ed performance. These students will get 2% extra credit in the overall scores.
Use Ed Instead of Email
Please email the staff only if your question is personal and confidential. Most questions can be asked on Ed. Make it public if you think it can benefit the entire class. If you want to avoid the class seeing it, make it private and visible to all the instructors so that the staff members can see it and give consistent answers.
Prerequisites
You should have taken CS 220P ("Databases and Data Management") or an equivalent course.
Grade Book
All the homework should be submitted via Gradescope Links to an external site.. Your grades will be returned through GradeScope (for regrades) and finally imported into Canvas.
Grading Breakdown
Homework: 56%
Lab attendance: 4%
Final: 40%
If you disagree with the grading for all the graded projects and exams, you can discuss them with us within one week after they are returned. After that, all the grades will be finalized.
Homework Late Policy
- The official due date for each assignment is listed here on this page; students are expected to turn the work in on or before that date.
- We will offer a 24-hour grace period for each assignment and accept submissions turned in within 24 hours of the due date, with a 10-point penalty. It's 10 points, not 10 percent. For example, if your late project got 87 points, your real score will be 87-10=77 points.
- Late assignments after the grace period will NOT be accepted beyond the grace period, so always aim to be on time! Please don't even ask, as this is what the 24-hour grace period is intended for.
Policy on Academic Honesty
- All students will be expected to adhere to the UCI and ICS Academic Honesty policies (see https://conduct.uci.edu/students/academic-integrity/index.php for details). Any student found to be involved in cheating or aiding others in doing so will be academically prosecuted to the maximum extent possible: that means you will fail this course. Just say no to cheating!
- In case you reuse another party's source code for certain generic tasks make sure you explicitly comment on its origin in your source code.