224P: Big Data Management (Fall 2024)

Lectures: TuTh, 11:00-12:20pm, SSL 228, Prof. Chen Li

Lab: W, 2:00- 2:50p, ICS 174, TA Jiadong Bai

Final schedule: Tuesday, Dec 10, 2024,  10:30 a.m. - 12:30 p.m. Overflow room: Social Ecology II (SE2) 1304.

Weekly Office Hours Email (use Ed first)

Instructor: ​Prof. Chen Li

Tuesdays 2 - 3 pm, DBH 2086

Except: Nov. 21, Thursday, 4:30 - 5:30 pm

chenli@ics.uci.edu

TA: Jiadong Bai

Monday 10 - 11AM, ICS 458A jiadongb@uci.edu

TA: Xiaozhen Liu Links to an external site.

Fridays 10 - 11 AM, ICS 458A

xiaozl3@ics.uci.edu

Course Overview

We will focus on big data systems, as well as relational and non-relational database technologies, including document (“NoSQL”) databases and emerging cloud data management solutions.

Lectures (subject to change)

Lecture Notes  Date Topic
01 PPTX Download PPTX, PDF Download PDF Th 09/26/24 Course overview, HW1, data history, big data overview
02 PPTX Download PPTX, PDF Download PDF Tu 10/01/24 Big data overview; Relational DBMS and principles
03 PPTX Download PPTX, PDF Download PDF Th 10/03/24  DBMS Principles (continued), Parallel DBMS (skipped) 
04 (Video Links to an external site.) PPTX Download PPTX, PDF Download PDF Tu 10/08/24 NoSQL Column Family Stores, Apache Cassandra
05 (Video Links to an external site.) PPTX Download PPTX, PDF Download PDF Th 10/10/24 Apache Cassandra (2)
06 PPTX Download PPTX, PDF Download PDF Tu 10/15/24 Key-value stores and consistencies
07 PPTX Download PPTX, PDF Download PDF Th 10/17/24 Consistency in Cassandra
08 PPTX Download PPTX, PDF Download PDF, lecture08-mongo-examples.ipynb) Download lecture08-mongo-examples.ipynb) Tu 10/22/24 Json and MongoDB
09 (Video Links to an external site.) PPTX Download PPTX, PDF Download PDF, lecture09-mongo-examples.ipynb Download lecture09-mongo-examples.ipynb Th 10/24/24 MongoDB (2)
10 (Video Links to an external site.) Ditto Extra MongoDB (3)
11 PPTX Download PPTX, PDF Download PDF Tu 10/29/24 GraphDB and Neo4j
12 PPTX Download PPTX, PDF Download PDF Tu, 11/5/24 Neo4j (2)
13 PPTX Download PPTX, PDF Download PDF Th, 11/7/24 Neo4j (3)
14 PPTX Download PPTX, PDF Download PDF Tu, 11/12/24 HDFS and MapReduce
15 PPTX Download PPTX, PDF Download PDF, lecture15-SparkNotebook.ipynb Download lecture15-SparkNotebook.ipynb, Data files Th, 11/14/24  Apache Spark (1)
16 Ditto (guess lecture by TA Xiaozhen Liu) Tu, 11/19/24  Spark (2)
17 PPTX Download PPTX, PDF Download PDF Th, 11/21/24  Spark (3), Apache Flink
18 Tu, 11/26/24
Th, 11/28/24 Thanksgiving, no class
19 Tu, 12/03/24
20 Th, 12/05/24 Course wrap up, review

Discussion(Lab) Session 

Slide Date Topic
01 Links to an external site. Wed 10/02/24 PostgreSQL Practice
02 Links to an external site.

Wed 10/09/24

Cassandra Concepts
03 Links to an external site.

Wed 10/16/24

Key-Value Store & CAP
04 Links to an external site.

Wed 10/23/24

JSON and MongoDB
05 Links to an external site.

Wed 10/30/24

Neo4j
06 Links to an external site.

Wed 11/06/24

Neo4j(2)

Homework (subject to change)

Description of the domain use case for HW1-HW6: ZotMusic Vision.pdf Download ZotMusic Vision.pdf

HW Deadline # of days Topic Setup Info Details Solutions
1 Mon, 10/7/24, 11:45 pm 12 Relational DBMS (PostgreSQL) PostgreSQL Setup Instructions.pdf Download PostgreSQL Setup Instructions.pdf

HW1 Details.pdf Download HW1 Details.pdf

ZotMusicDDL.sql Download ZotMusicDDL.sql

CSV file Download CSV file

HW1-Template.sql Download HW1-Template.sql

HW1_Solution.pdf Download HW1_Solution.pdf

2 Sat, 10/19/24 11:45 pm 12 Cassandra

HW2 Setup.pdf Download HW2 Setup.pdf

HoofersDB.sql Download HoofersDB.sql

HW2 Details.pdf Download HW2 Details.pdf

HW2 Template.docx Download HW2 Template.docx

HW2 Solution.pdf Download HW2 Solution.pdf

3

Thur, 10/31/24

12 MongoDB

HW3 Setup.pdf Download HW3 Setup.pdf

zot-music-assignment3.zip Download zot-music-assignment3.zip

HW3_Helper.ipynb Download HW3_Helper.ipynb

HW3 Details.pdf Download HW3 Details.pdf

HW3_template.ipynb Download HW3_template.ipynb

HW3_solution.ipynb Download HW3_solution.ipynb

4

Tue,

11/12/24

12 Neo4J

HW4 Setup.pdf Download HW4 Setup.pdf

zot-music-dataset-assignment4.zip Download zot-music-dataset-assignment4.zip

HW4 Details.pdf Download HW4 Details.pdfHW #4 Template.docx Download HW #4 Template.docx
5

Sunday,

11/24/24

12 Spark

HW5 Setup.pdf Download HW5 Setup.pdf

zot-music-dataset-assignment5-sample.zip Download zot-music-dataset-assignment5-sample.zip

zot-music-dataset-assignment5-full.zip Download zot-music-dataset-assignment5-full.zip

HW5 Details.pdf Download HW5 Details.pdf

HW5_template.ipynb Download HW5_template.ipynb

6 Flink

Online Discussion

We are using Ed Discussion Links to an external site. for course discussion.

  • Please use Ed properly. It's a place for students to exchange ideas. Don't post easy or random questions without much thinking.
  • To encourage students to participate in Ed discussions and provide high-quality answers actively, we will select 2 students with the best Ed performance. These students will get 2% extra credit in the overall scores.

Use Ed Instead of Email

Please email the staff only if your question is personal and confidential. Most questions can be asked on Ed. Make it public if you think it can benefit the entire class. If you want to avoid the class seeing it, make it private and visible to all the instructors so that the staff members can see it and give consistent answers.

Prerequisites

You should have taken CS 220P ("Databases and Data Management") or an equivalent course. 

Grade Book

All the homework should be submitted via Gradescope Links to an external site.. Your grades will be returned through GradeScope (for regrades) and finally imported into Canvas.

Grading Breakdown

Homework: 56%

Lab attendance: 4% 

Final: 40% 

If you disagree with the grading for all the graded projects and exams, you can discuss them with us within one week after they are returned. After that, all the grades will be finalized.

 

Homework Late Policy

  • The official due date for each assignment is listed here on this page; students are expected to turn the work in on or before that date.
  • We will offer a 24-hour grace period for each assignment and accept submissions turned in within 24 hours of the due date, with a 10-point penalty. It's 10 points, not 10 percent. For example, if your late project got 87 points, your real score will be 87-10=77 points.
  • Late assignments after the grace period will NOT be accepted beyond the grace period, so always aim to be on time! Please don't even ask, as this is what the 24-hour grace period is intended for.

Policy on Academic Honesty

  • All students will be expected to adhere to the UCI and ICS Academic Honesty policies (see https://conduct.uci.edu/students/academic-integrity/index.php for details). Any student found to be involved in cheating or aiding others in doing so will be academically prosecuted to the maximum extent possible: that means you will fail this course. Just say no to cheating!
  • In case you reuse another party's source code for certain generic tasks make sure you explicitly comment on its origin in your source code.