Project Proposal
Due Date: May 15th
For this assignment, you have to submit a 1-page PDF as a team. There are only a few simple requirements for this submission. Apart from the team (which cannot change after the deadline), the other details are not set in stone, and can be changed in discussion with the instructors.
Team Details
The first thing you need to do is tell Canvas who is on your team. To do that, follow the following instructions:
- Login to your canvas account, and navigate to the BANA 290 page.
- Click ’People’ on the left navigation bar.
- Click the ’Groups’ tab. You should see a list of potential groups to join.
- If your team has not already claimed one, join one of the empty “Project X” groups. Otherwise, join the same group as your teammates. Warning: Do NOT use the “+GROUP” button to create a new group!
- Change the group name from “Project X” to your team name. To do this, navigate to your
group’s homepage and click the “Edit Group” button. Note: You will only be able to do this if you are the
first member of your team to join the group.
That's it! Your group is registered.
Task Details
In the report, include a few sentences about the project:
- Which dataset have you picked (list is given below)? Why do you find it interesting?
- What is the classification task you're interested in? (might be obvious for some datasets)
Responsibilities
In a few sentences, tell us how you're planning to split up the work. If your team has 3 members, describe in a paragraph what "extra" work are you planning to include in your project (e.g. combine with unsupervised learning, use sophisticated machine learning (like deep learning), incorporate multiple datasets, etc.).
Datasets
Individual datasets:
- Amazon reviews: http://jmcauley.ucsd.edu/data/amazon/ Links to an external site.
- Yelp reviews: https://www.yelp.com/dataset Links to an external site.
- Quora question deduplication: https://www.kaggle.com/c/quora-question-pairs/data Links to an external site.
- Fake News Challenge: http://www.fakenewschallenge.org/ Links to an external site.
- Trump Twitter: http://www.trumptwitterarchive.com/archive Links to an external site.
- Sentiment Analysis on Twitter Data: http://help.sentiment140.com/for-students/ Links to an external site.
- Toxic comments on Wikipedia: https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/data Links to an external site.
- Wikipedia Data: http://nlp.cs.nyu.edu/wikipedia-data/ Links to an external site.
- Blog Authorship: http://u.cs.biu.ac.il/~koppel/BlogCorpus.htm Links to an external site.
List of datasets:
- NLTK dataset: http://www.nltk.org/nltk_data/ Links to an external site.
- Allen Institute for AI (AI2) Datasets: http://allenai.org/data.html Links to an external site.
- Word-emotion lexicon: http://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm Links to an external site.
- Kaggle Datasets: https://www.kaggle.com/datasets Links to an external site.
- UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets.html?area=&att=&format=&numAtt=&numIns=&sort=nameUp&task=&type=text&view=table