224P: Big Data Management (Fall 2024)

Lectures: TuTh, 11:00-12:20pm, SSL 228, Prof. Chen Li

Lab: W, 2:00- 2:50p, ICS 174, TA Jiadong Bai

Final schedule: Tuesday, Dec 10, 2024,  10:30 a.m. - 12:30 p.m. Overflow room: Social Ecology II (SE2) 1304. Please read the following post with instructions about the final: https://edstem.org/us/courses/67736/discussion/5812789 . We will finish all the grading (including the homework and final) by the end of Sunday, Dec. 15. You will use Monday, Dec. 16 to submit your regrade requests, after which we will finalize all the grades.

Weekly Office Hours Email (use Ed first)

Instructor: ​Prof. Chen Li

Tuesdays 2 - 3 pm, DBH 2086

Dec. 9 (Monday) from 10 - noon

chenli@ics.uci.edu

TA: Jiadong Bai

Monday 10 - 11AM, ICS 458A jiadongb@uci.edu

TA: Xiaozhen Liu

Fridays 10 - 11 AM, ICS 458A

xiaozl3@ics.uci.edu

Course Overview

We will focus on big data systems, as well as relational and non-relational database technologies, including document (“NoSQL”) databases and emerging cloud data management solutions.

Lectures (subject to change)

Lecture Notes  Date Topic
01 PPTX, PDF Th 09/26/24 Course overview, HW1, data history, big data overview
02 PPTX, PDF Tu 10/01/24 Big data overview; Relational DBMS and principles
03 PPTX, PDF Th 10/03/24  DBMS Principles (continued), Parallel DBMS (skipped) 
04 (Video) PPTX, PDF Tu 10/08/24 NoSQL Column Family Stores, Apache Cassandra
05 (Video) PPTX, PDF Th 10/10/24 Apache Cassandra (2)
06 PPTX, PDF Tu 10/15/24 Key-value stores and consistencies
07 PPTX, PDF Th 10/17/24 Consistency in Cassandra
08 PPTX, PDF, lecture08-mongo-examples.ipynb) Tu 10/22/24 Json and MongoDB
09 (Video) PPTX, PDF, lecture09-mongo-examples.ipynb Th 10/24/24 MongoDB (2)
10 (Video) Ditto Extra MongoDB (3)
11 PPTX, PDF Tu 10/29/24 GraphDB and Neo4j
12 PPTX, PDF Tu, 11/5/24 Neo4j (2)
13 PPTX, PDF Th, 11/7/24 Neo4j (3)
14 PPTX, PDF Tu, 11/12/24 HDFS and MapReduce
15 PPTX, PDF, lecture15-SparkNotebook.ipynb, Data files Th, 11/14/24  Apache Spark (1)
16 Ditto (guest lecture by TA Xiaozhen Liu) Tu, 11/19/24  Spark (2)
17 (Video) PPTX, PDF Th, 11/21/24  Spark (3), Apache Flink
18 PPTX, PDF, lecture18-flink-notebook Tu, 11/26/24  Flink (2)
Th, 11/28/24 Thanksgiving, no class
19 PPTX, PDF Tu, 12/03/24 Parallel DBMS (from Lecture 03), Search
20 Ditto Th, 12/05/24 Search (2), course review, wrap up.

Discussion(Lab) Session 

Slide Date Topic
01 Wed 10/02/24 PostgreSQL Practice
02

Wed 10/09/24

Cassandra Concepts
03

Wed 10/16/24

Key-Value Store & CAP
04

Wed 10/23/24

JSON and MongoDB
05

Wed 10/30/24

Neo4j
06

Wed 11/06/24

Neo4j(2)
07

Wed 11/13/24

Hadoop & MapReduce
08

Wed 11/20/24

Spark
09(No slides)

Wed 11/27/24

Flink
10

Wed 12/05/24

Reviews

Homework (subject to change)

Description of the domain use case for HW1-HW6: ZotMusic Vision.pdf

HW Deadline # of days Topic Setup Info Details Solutions
1 Mon, 10/7/24, 11:45 pm 12 Relational DBMS (PostgreSQL) PostgreSQL Setup Instructions.pdf

HW1 Details.pdf

ZotMusicDDL.sql

CSV file

HW1-Template.sql

HW1_Solution.pdf

2 Sat, 10/19/24 11:45 pm 12 Cassandra

HW2 Setup.pdf

HoofersDB.sql

HW2 Details.pdf

HW2 Template.docx

HW2 Solution.pdf

3

Thur, 10/31/24

12 MongoDB

HW3 Setup.pdf

zot-music-assignment3.zip

HW3_Helper.ipynb

HW3 Details.pdf

HW3_template.ipynb

HW3_solution.ipynb

4

Tue,

11/12/24

12 Neo4J

HW4 Setup.pdf

zot-music-dataset-assignment4.zip

HW4 Details.pdfHW #4 Template.docx HW4 Solution.pdf
5

Extended deadline (Tuesday,

11/26/24)

14 Spark

HW5 Setup.pdf

zot-music-dataset-assignment5-sample.zip

zot-music-dataset-assignment5-full.zip

HW5 Details.pdf

HW5_template.ipynb

HW5_solution.html
6

Sun,

12/08/24

12 Flink

HW6 Setup.pdf

zot-music-dataset-hw6.zip

HW6 Details.pdf

HW6 template.ipynb

 

HW6_solution.ipynb

Online Discussion

We are using Ed Discussion for course discussion.

  • Please use Ed properly. It's a place for students to exchange ideas. Don't post easy or random questions without much thinking.
  • To encourage students to participate in Ed discussions and provide high-quality answers actively, we will select 2 students with the best Ed performance. These students will get 2% extra credit in the overall scores.

Use Ed Instead of Email

Please email the staff only if your question is personal and confidential. Most questions can be asked on Ed. Make it public if you think it can benefit the entire class. If you want to avoid the class seeing it, make it private and visible to all the instructors so that the staff members can see it and give consistent answers.

Prerequisites

You should have taken CS 220P ("Databases and Data Management") or an equivalent course. 

Grade Book

All the homework should be submitted via Gradescope. Your grades will be returned through GradeScope (for regrades) and finally imported into Canvas.

Grading Breakdown

Homework: 56%

Lab attendance: 4% 

Final: 40% 

If you disagree with the grading for all the graded projects and exams, you can discuss them with us within one week after they are returned. After that, all the grades will be finalized.

 

Homework Late Policy

  • The official due date for each assignment is listed here on this page; students are expected to turn the work in on or before that date.
  • We will offer a 24-hour grace period for each assignment and accept submissions turned in within 24 hours of the due date, with a 10-point penalty. It's 10 points, not 10 percent. For example, if your late project got 87 points, your real score will be 87-10=77 points.
  • Late assignments after the grace period will NOT be accepted beyond the grace period, so always aim to be on time! Please don't even ask, as this is what the 24-hour grace period is intended for.

Policy on Academic Honesty

  • All students will be expected to adhere to the UCI and ICS Academic Honesty policies (see https://conduct.uci.edu/students/academic-integrity/index.php for details). Any student found to be involved in cheating or aiding others in doing so will be academically prosecuted to the maximum extent possible: that means you will fail this course. Just say no to cheating!
  • In case you reuse another party's source code for certain generic tasks make sure you explicitly comment on its origin in your source code.