Projects
Final projects are worth 40% of your overall course grade: 3% for determining your project team and area; 12% for a 2-3 page project proposal; 25% for a final technical report. Successful projects typically have the following key steps:
- Find an application whose structure seems promising for statistical machine learning. Ensure data is easily available. For example, you could consider data from past Kaggle challenges Links to an external site. that has temporal, spatial, or other structure.
- Propose a few related probabilistic models and/or learning algorithms with varying degrees of complexity. Usually these models/algorithms are ones discussed in lecture, or related generalizations from the research literature.
- Implement learning and/or inference algorithms that are appropriate for your probabilistic model and data. You are encouraged to build on existing software packages, but should do more than simply run off-the-shelf code.
- Validate your approach via careful experiments. Application performance numbers are interesting, but you should also confirm the correctness of your learning algorithms, and inspect learned model parameters or structure.
Examples of probabilistic models and methods that are promising for course projects include:
- Gaussian processes for regression or classification
- Bayesian optimization for tuning the hyperparameters of other machine learning methods
- Mixture models for data clustering
- Hidden Markov models (or state space models) for temporal data
- Topic models for document collections
- Variational autoencoders
- Application of learning algorithms from class, such as the EM algorithm or variational inference, to other probabilistic models
Broadly, you should try to apply concepts from the class to models and data that you find interesting.
Available Compute Resources
When choosing a project topic, you should ensure that you have adequate computing resources. Below is a list of computing resources that are either provided by UCI to students in ICS course, or freely available online.
- OpenGPU Cluster @ ICS (Link)
- Has a couple of Titan XP GPUs available for anyone with an ICS account to use
- Beware: This cluster is heavily used so it may take some time to get allocated a GPU
- Openlab Jupyterhub @ ICS (Link)
- CPU cluster that runs Jupyter Notebooks
- No GPUs are available on this cluster
- Google Colab (Link
Links to an external site.)
- Run by Google and offers access to CPU and GPU nodes for free
- Note: The Colab Notebooks have a inactivity timeout
- Saturn Cloud (Link
Links to an external site.)
- Similar to Google Colab (Offers limited CPU and GPU instances for free)
- Various other commercial platforms also offer free trials for their GPU instances.
Project Teams (May 22)
You need to identify your project team, as well as propose a general topic for your project. We strongly encourage teams with three, four, or (at most) five members. One way to form teams would be to use Ed Discussion Links to an external site..
If you would like to do a solo project, please email the instructor to request permission. Typically students with sufficient background for solo projects have graduate research experience.
Identify your project team by submitting a brief description of your project area on gradescope Links to an external site.. One team member should upload the description, and then will have the option to identify the other team members. (Project proposals, presentation slides, and reports will be submitted similarly.) Use the "View or edit group" option on gradescope to be sure this is done correctly. Do not complete the assignment multiple times; only one team member should submit.
Project Proposals (May 30)
The project proposal should be about 2-3 pages long, including all references and figures. We encourage, but do not require, you to use the NeurIPS LaTeX style file Links to an external site. (use the "preprint" option so authors are visible). Proposals must be uploaded as a single pdf file. Your proposal should contain the following information:
- A clear description of the problem or application you intend to address. Why is it worth studying?
- A discussion of related work, including references to at least three relevant research articles or technical reports. Which aspects of your project are novel?
- A figure illustrating a preliminary experiment with some data related to your project. This could be some sort of visualization of the raw data, or the results of running a simple (supervised or unsupervised) machine learning method.
- A description of the learning and/or inference challenges that you need to solve to apply statistical machine learning to your data. It is fine if you do not yet know what algorithms are appropriate, but discuss the challenges that need to be solved.
- An experimental evaluation protocol. How will you know that you've succeeded?
Project Reports (June 13)
The technical report should be about 4-6 pages long. Although the results need not be highly novel, the presentation and experiments should be of high quality. We encourage, but do not require, you to use the NeurIPS LaTeX style file Links to an external site. (use the "preprint" option so authors are visible). Reports must be uploaded as a single pdf file. Your report should include:
- A clear description of the problem addressed, and summary of related work with appropriate references. Include a figure illustrating the model(s) used in your project.
- A mathematically precise description of the statistical models and learning algorithms that you consider. For parts of your project which are novel, include sufficient detail for a knowledgable expert to reproduce your work. Where you adapt previous work, included detailed references.
- To help verify that your statistical learning algorithm is working properly, at least one plot showing the learning objective (conditional log-likelihood for a classification or regression method, a log-likelihood bound for a variational method, etc.) as a function of the number of learning iterations.
- Some sort of visualization of the learned model structure; summary performance numbers are not sufficient. For example, for many probabilistic models it is possible to plot the learned clusters or features or states, sample from the learned generative model, visualize results on low-dimensional toy data, show example predictions on test data, etc.
- A description of implementation details, including references for any code that was adapted and reused, a high-level summary of the functionality that your code implements, the programming language(s) you used, etc.
- Mandatory: A description of how each team member contributed to the project.