Syllabus
CS122C/CS222: Principles in Data Management
TuTh, 9:30-10:50, Steinhaus Hall 128
Course Personnel
Staff | Office Hours | Zoom | |
Instructor: Prof. Chen Li |
Tuesdays, 11 am - noon Office hour during the finals week: 3/19, Tuesday, 10 - 11 am, DBH 2086, in-person only |
https://uci.zoom.us/j/94788585384 Links to an external site. and DBH 2086 | chenli@ics.uci.edu |
Assistant: Yunyan Ding |
Mondays, 4 p.m - 5 p.m |
https://uci.zoom.us/j/2991875437 Links to an external site. ICS 458C |
yunyad1@uci.edu |
Course Description
This course provides an implementor's view of database management systems. It covers fundamental principles and implementation techniques, issues, and trade-offs related to database management systems. Topics covered include storage management, buffer management, record-oriented file systems, access methods, query processing, and query optimization. A significant portion of database systems research as well as industrial database and information system development deals with adapting the basic database techniques covered in this course to new advances in hardware and software technologies or to the requirements of different applications and data types.
Class attendance
In-person attendance is required. We will use the lectures to cover many important topics, including discussions about the projects (designs and code reviews). Missing a lecture will be significant loss for students.
Projects
The project is structured as a four-part, software development exercise:
Project | Due Date | Allocated Days | Topic | Proportions (60%) |
Project 1 | Sunday, Week 2 Jan 20 |
12 |
Getting Started, Paged File Manager (PFM) & Record-Based File Manager (RBFM) |
14% |
Project 2 | Sunday, Week 5 Feb 10 |
21 | 17% | |
Project 3 | Sunday, Week 8 Mar 3 |
21 | 17% | |
Project 4 | Sunday, Week 10 Mar 17 |
14 | 12% |
Please refer to the Assignments link for more detailed instructions.
Lectures
Date | Topic | Readings | Notes |
01/09/24 (Tu) | Course Overview, DBMS Overview, Storage, P1 | Ch. 1; Chamberlin et al 1981 | 01 Download 01 |
01/11/24 (Th) | P1: PagedFileManager and RecordBasedFileManager, | Ch. 9.1, 9.2, 9.6, 9.7 | 02 Download 02 |
01/16/24 (Tu) | Disks, Record/page formats | ||
01/18/24 (Th) |
deleteRecord(), updateRecord(), PAX, Column/Row stores, OLTP/OLAP |
Ch. 9.3, 9.4, 9.5 | 03 Download 03 |
01/23/24 (Tu) | Parquet, Heap Files, Buffer Manager, code reviews | Ch. 12.1 | |
01/25/24 (Th) | P2, Catalog | Ch. 8.4 | 04 Download 04 |
01/30/24 (Tu) | Schema versioning, File Organizations, Index Overview | Ch. 8.4, Ch. 10 | 05 Download 05 |
02/01/24 (Th) | Clustered index, B+ tree | Ch. 10.3-10.6, Graefe 2011 (Sec. 1-3) | 06 Download 06, 07 Download 07 |
02/06/24 (Tu) | B+ tree | Ditto | Ditto |
02/08/24 (Th) | Hash tables, index comparison | Ch. 11, Litwin 1980 | 08 Download 08 |
02/13/24 (Tu) | In-class midterm | ||
02/15/24 (Th) | P3 overview; Index performance analysis, Composite index | Ch. 8.4 - 8.6 | 09, Download 09,10 Download 10 |
02/20/24 (Tu) | Index-only plans, External Sort | Ch. 13, Ch. 14.1 - 14.2 | 11 Download 11 |
02/22/24 (Th) | Selection Operator, Projection, Duplicate Elimination, Join (Part 1) | Ch. 14.5, 14.6 | 12 (s19) Download 12 (s19) |
02/27/24 (Tu) | Join (Part 2) | Ch. 14.5, 14.6 | 13 (S20) Download 13 (S20) |
02/29/24 (Th) | Aggregation, set operators, | Ch. 15.2 - 15.4 | 14 (S10) Download 14 (S10) |
03/05/24 (Tu) | P4 overview, optimization overview | Ditto (S14) | |
03/07/24 (Th) | Cost Estimation, Histograms | Ch. 15.6 | 15 (S17) Download 15 (S17) |
03/12/24 (Tu) | System-R optimizer, Volcano optimizer | Selinger et al 1979 | 16 Download 16 |
03/14/24 (Th) | Parser, Other Indexes, Other Data Systems, Storage-compute decoupling (e.g., Snowflake, Databricks) | 17 Download 17 |
Working on projects as teams
For CS122C undergraduates
Students form a team of up to two (2) members to do the projects.
1) Both members should access the same github repository, and each team only needs one submission.
2) A student can always leave his/her team during the quarter with a notice at least two weeks prior to the next project's deadline. Note that it is not permitted to join a new team after splitting. After a split, each member should have the access to the codebase before the split.
3) For each project done by a student individually, you will get a 10% extra credit.
4) Please sign up and include both team members in the team signup spreadsheet Links to an external site. to form the team.
For CS222 graduate students
Students do the projects individually.
Online Discussion
We are using Links to an external site.Ed Discussion Links to an external site. as the forum for course discussion. Students are required to check all the posts frequently for important discussions and announcements.
Rather than emailing questions to any of us on the teaching staff, we'll ask you to post your questions on Ed Discussion.
- Please use the forum properly. It's a place for students to exchange ideas. Don't post "easy" or "random" questions without much thinking.
- To encourage students to actively participate in the forum discussions and provide high-quality answers, we will select two students with the best discussion performance. Each student will get 2% extra credits in the overall scores. We reserve the right not to choose any students to receive the credits.
- Please make this online discussion (at least monitoring the activity there) a part of your regular routine!
Prerequisites
Please don't attempt this class without the required background! Doing so has not proven to be a great idea for students historically. :-) You should have the following prerequisites:
- And one of the two courses below:
Github and GradeScope
- You are required to use Github to manage your source code.
- We are also using GradeScope for grading. Learn more about auto-grading with GradeScope AutoGrader here. Links to an external site.
- Each project will have a corresponding GradeScope assignment that will be released according to our assignment schedule. You can either submit by uploading your source code or by pulling the source code from your GitHub repo, and can submit arbitrary number of times before the deadline. After submission the AutoGrader will run automatically and after it finishes running you should be able to see if you can pass each test case immediately.
- Students are allowed to submit after the deadline of the project. But if you submit on GradeScope during Grace Period, we consider you using the Grace Period. So if you don't want to use Grace Period, please do not submit assignments on GradeScope during the 48 hours after the deadline.
Grading:
- Midterm Exam: 20%, 02/13/23 (Tuesday), in-class, in-person.
- Final Exam: 20%, 3/21/2024, Thursday, 8:00 am - 10:00 am, in-person (according to the University schedule).
- Programming Projects: 60%
This mixture of grading criteria for this course is intended to give equal "excelling opportunities" to both thinkers and ``doers.'' This is a hands-on project class, so expect to put a significant effort into your projects. The exam will be comprehensive with respect to the course material and will also ping you with respect to project knowledge.
For all the graded projects and exam, if you disagree with the grading, you can discuss with us within one week after they are returned. After that, all the grades will be finalized. For the final exam, the disputation deadline is subject to change upon University's grade submission timeline.
Textbooks
Required: Database Management Systems, 3rd edition, by R. Ramakrishnan and J. Gehrke, McGraw Hill, 2003.
Recommended: Readings in Database Systems, 4th edition, by J. Hellerstein and M. Stonebraker, MIT Press, 2005.
Other Readings
The following papers are mostly drawn from the Hellerstein and Stonebraker book or can be found via UCI's ACM Digital Library subscription (http://dl.acm.org Links to an external site.). You will be responsible both for the material covered in the lectures and the material covered in the readings listed in the Syllabus below.
- Abadi, D., Madden, S., and Hachem, N. 2008. Column-stores vs. Row-stores: How Different are They Really?. Proc. ACM SIGMOD Intl. Conf. on Management of Data (Vancouver, Canada, June 2008).
- Alsubaiee, S., et a., 2014. AsterixDB: a scalable, open source BDMS. Proc. VLDB Endow. 7, 14 (Oct. 2014), 1905-1916.
- Chamberlin, D., et al, 1981. A history and evaluation of System R. Commun. ACM 24, 10 (Oct. 1981), 632-646.
- Dean, J., and Ghemawat, S., 2004. MapReduce: Simplified data processing on large clusters. Proc. 6th USENIX Symp. on Op. Sys. Design and Impl. (OSDI'04) (San Francisco, CA, December 2004).
- DeWitt, D. and Gray, J. 1992. Parallel database systems: The future of high performance database systems. Commun. ACM 35, 6 (June 1992), 85-98.
- Graefe, G., 2011. Modern B-Tree Techniques. Foundations and Trends in Databases 3, 4 (Sept. 2011) 203-402. (Available from the UCI intranet at: http://dx.doi.org/10.1561/1900000028 Links to an external site.)
- Guttman, A. 1984. R-trees: A dynamic index structure for spatial searching. Proc. ACM SIGMOD Intl. Conf. on Management of Data (Boston, Massachusetts, June 1984).
- Litwin, W., 1980. Linear hashing: A new tool for file and table addressing. Proc. 6th Int'l. Conf. on Very Large Data Bases, Montreal, Oct. 1980.
- Nievergelt, J., Hinterberger, H., and Sevcik, K. C. 1984. The Grid File: An adaptable, symmetric multikey file structure. ACM Trans. Database Syst. 9, 1 (March 1984), 38-71. (optional)
- Selinger, P. G. et al, 1979. Access path selection in a relational database management system. Proc. ACM SIGMOD Int'l. Conf. on Management of Data (Boston, Massachusetts, May-June 1979).
- Shapiro, L. D. 1986. Join processing in database systems with large main memories. ACM Trans. Database Syst. 11, 3 (Aug. 1986), 239-264.
- Stonebraker, M. 1981. Operating system support for database management. Commun. ACM 24, 7 (July 1981), 412-418.
Programming Project
This course is intended to teach the principles involved in DBMS implementation, so it will include a significant programming component.
- The project will be aimed at giving you a sense of what goes on inside a DBMS, and what the issues and challenges are in building a system "now" that will be managing data of various user-defined shapes and sizes "later". You will try your hand at record-based file storage, indexing, and relational runtime system programming (a.k.a. query operators).
- The language for the project will be C++ - hopefully, you either have C++ experience or you have Java experience and can learn C++ quickly. Our assistant(s) will run the project, and they can provide you with additional suggestions for material to help you "spin up" on C++ programming, debugging, and tools in case you need to do a bit of remedial work on the side in order to tackle the project successfully. The course has the same requirements for both graduate students and undergraduate students.
Notice about Projects
- Students are allowed to add attributes, methods, and classes, as long as the public signatures provided in the codebase are kept and implemented. We will be testing your code by calling the methods that are declared in the signatures. See README.md in the project for more details.
- We have provided Gtest cases for testing and grading purposes. The test cases we used for grading might be different than the ones provided to you.
Project Late Policy
- The official due date for each assignment is listed above, and it is expected that students will turn most work in on or before that date.
- Grace Period
- We will offer a 48-hour "grace period" for each assignment, and will therefore accept solution submissions that are turned in within 48 hours of the due date (i.e., less than 2 days late). No need to ask - this time is yours in the event that you need it for some unforeseen reason.
- If you use it, your score will be deducted by 10 points. You are recommended NOT to use the grace period, as you will lose not only 10 points, but also valuable time from the next assignment's working period for every hour that you run late.
- Late assignments after the grace period will NOT be accepted, so do always aim to be on time! (Please don't even ask, as this is what the 48-hour grace period is intended for. :-))
C/C++ Resources
- A helpful reference for C/C++ libraries: http://www.cplusplus.com/reference/ Links to an external site.
- A tutorial on programming in C/C++ and guide on related resources: http://www.cprogramming.com/tutorial.html#c++tutorial Links to an external site.
- A tutorial on learning C++ for Java programmers: http://pages.cs.wisc.edu/~hasti/cs368/CppTutorial/index.html Links to an external site.
Cheating Policy
- Cheating is an area where the instructor for this course has absolutely no patience or sympathy whatsoever. You are all here to learn, and cheating defeats that purpose and is unfair to your fellow students. Students will be expected to adhere to the UCI and ICS Academic Honesty policies (and should see http://www.editor.uci.edu/catalogue/appx/appx.2.htm#academic and http://www.ics.uci.edu/ugrad/policies/index.php#academic_honesty to read their details).
- We run a software package to search for cheating cases in projects. Any student found to be cheating or aiding others in cheating will be academically prosecuted to the maximum extent possible — so you may fail the course with no option for a second chance. This policy will be strictly adhered to; no exceptions will be possible regardless of the impact it may have on your studies, your work plans, your visa status, or anything else. Don’t cheat and you won’t have issues. Just say no to cheating!
Comprehensive Exam Option
Comprehensive Exam Signup Sheet Links to an external site.
- Computer Science graduate students wishing to satisfy the MS Comprehensive Exam requirement in Database Systems via this course should notify the instructor and the Student Advising Office (SAO) of this intention via e-mail before the day of the final exam (alternatively, you can directly add your name to the sheet below).
- For such students, a weighted final exam performance will be used to determine your MS Comprehensive Exam grade (P or F) at the end of the course, and this grade will be reported to SAO.
- Students who do not pass successfully can try the exam again the next time this course is offered by taking the final exam (and midterm exam, if offered) in that offering of the course; hopefully, this will not be necessary for anyone. (Students who do this are advised to informally audit the next offering of the class; this is because the course emphasis and expected knowledge can vary a bit by term and by the instructor, and also because you'll probably need the review to pass.