CSCI 385: Intro to Distributed Systems
Fall 2017

Course Assignments

Programming Assignments

Programming Assignment 1: Java Warm-up

Assigned: 8/31
Due: 9/8 via email by 10pm

This is a warm-up project in Java to help familarize you with Java (as well as the ant build system). It is essentially something you might see at the end of a class on Object Oriented design. The writeup can be found here: Assignment 1: Java Warm-up

Programming Assignment 2: Chat Server/Client

Assigned: 9/12
Due: 10/6 10/9(See assignment for milestones and extra credit!)

This is our first major project, and is designed to help you practice applying some of the networking skills we have been discussing in class. The Writeup can be found here Assignment 2: Chat Server/Client. There are mandatory checkpoints at the end of the project description - you will be graded on the checkpoint content, so make sure your repo is up to date with the required content on the due date. I will be pulling content from 11:59pm on the due date. We will not hit Threading for a while yet, so I have provided you with some starter Threading classes:, which is needed on both client and server side, as well as, which is designed for the client side, to provide an interactive console. Both of these classes will require you to add logic in order to function correctly.

Programming Assignment 3: Attacking AES with Threads

Assigned: 10/19
  Part 1: 10/24 by 9pm
  Part 2: 10/31 by 9pm
  Part 3: 11/7 by 9pm
  Part 4: 11/9 by 9pm (optional!)

This project is focused primarily on working with threads, and trying to tune them to solve a problem. The writeup for the assignment can be found here: Assignment 3: Attacking AES with Threads

The assignment is broken into 3 required and one optional challenge - make sure that you have the required information in your github repo on the date it is due. Also make sure to start early - some of your experiments will require hours (if not days) to run.

Feel free to move ahead of schedule and give yourself a break later on. There is no reason to not plow forwards with a new part while experiments are still running for previous parts.

NOTE: don't kill cs1. Make use of the UD lab machines (the power macs are monsters), as well as the pluto cluster (pluto-1 through pluto-6 all have 4 cores and 12G RAM (I believe)). Poorly tuned threading approaches will bring a machine to its knees.

Programming Assignment 4: Text Analysis/Generation with Hadoop

Assigned: 11/14
  Part 1: 11/20 writeup: 11:59pm by email demo: before 3pm 11/20
  Part 2: 11/29 writeup: 11:59pm by email demo: before 1:30pm 11/29
  Part 3: 12/8 writeup: 11:59pm by email demo: before 3pm 12/8
  Part 4: 12/13 writeup: 11:59pm by email demo: before 3pm 12/13(optional)

This assignment has 3 required and one optional part. For this assignment I will not be expecting you to upload things to the course github repo, I will be looking for written reports and demos. You are encouraged to add your code for this assignment to a personal, public repo so you can share it as an example of your work! (You can get away with just storing your code in github, but I would also recommend storing your modified conf files.

Part I: Setting up Hadoop - this part is more annoying and time-consuming than anything else. Basic steps can be found here: 2.8.2 Hadoop Setup Notes. After successfully setting up your Hadoop Cluster, load up your Dataset (~1GB of written work from Project Gutenberg, remember?). For your writeup: screenshots of webportals for your clusters, making sure to note any difficulties you had as well as any attributes you needed to modify beyond what I had in the Setup Notes. Additionally, tell me about your dataset, how you decided what it should consist of, and exactly how big it is! For the demo: Schedule a time with me during office hours to show your working webportals!

Part II: Hello Map Reduce! - Now that you have a running Hadoop cluster, it's time to run your first Map Reduce program: WordCount! This is mostly a sanity check to make sure everything is working correctly. You will be building off of this later, so make sure to take some time and read through all the settings that you are using to make sure you understand them. You will be running WordCount on your dataset and analyzing the results. Make sure to modify your WordCount to handle bigrams (2 words together). For your writeup: show the top 10 and bottom 10 words from WordCount, as well as the top 10 and bottom 10 bigrams from your own code. Compare and contrast these lists - were there any commonalities? Did the results make sense to you? Did you see any interesting trends? For the demo: Schedule a time with me during office hours to show your bigram code running! BONUS: Go above and beyond: run the same experiments with someone elses dataset - what did you see? What if you combine datasets and ran it? Modify your bigram code to handle arbitrary (user specified) N-grams - rerun your experiments, what do you see as you modify N?

Part III: Bigram Analysis - For this part, you will need to develop a multi-stage Map Reduce program to help build up tables of probabilities. For any given word, what is the probability that any other word will appear after it? In addition to your Map Reduce program(s) to generate this information, you will need to develop an additional Java program that will pull this data from HDFS and load it into memory (I'm thinking a HashMap with an ArrayList may be your best bet). Your program should allow the user to query a word, and see what other words may appear after it, along with the probability of each second word. For the writeup: Make sure to discuss your design decisions for the Map Reduce program(s) - how did you break apart the work, etc. Go back to your WordCount results from the previous writeup - for each of the 10 highest/lowest words, what probabilities do you see for the following words? For the demo: Schedule a time with me during office hours to show off your Java program that will let us query words. BONUS: Go above and beyond! Just as with the last part, run the same experiments with someone else's dataset - were things noticably different? What did you see? Try the same thing with both datasets - again, what differences did you see? Were you expecting them? Why/why not? Modify this assignment to work with your arbitrary N-gram code (up to and including the Java query program). What do you see as you modify N?

Part IV: Generating Text! - Now that we have the query program up and running, it is time to work on text generation! (Woo!) Try to generate a few paragraphs: how did you decide when a paragraph was done? Do you see different things if you use your dataset as opposed to someone else's? What if you combine the datasets? If you have implemented N-gram capability, what happens as you increase/decrease the size of N? Have some fun with this, and submit a writeup describing what you tried, what you saw, and what conclusions you drew. Also, submit some of your programs "original" writing!

Written Work

The template you should be using for your writeups can be found here: reviewStarter.tex All reviews should be submitted via email by 5pm on the due date.

Reading 1: How to Read a Paper

Due: 9/5
Link: How to Read a Paper

Reading 2: Congestion Avoidance and Control

Due: 9/12
Link: Congestion Avoidance and Control

Reading 3: Byzantine Generals

Due: 9/26
Link: Byzantine Generals

Reading 4: Chord

Due: 10/10
Link: Chord

Reading 5: Lamport Clocks

Due: 10/19
Link: Time, Clocks, and the Ordering of Events in a Distributed System

Reading 6: GFS

Due: 10/31
Link: The Google File System

Reading 7: MapReduce

Due: 11/9
Link: MapReduce: Simplified Data Processing on Large Clusters

Reading 8: Big Table

Due: 11/21
Link: Bigtable: A Distributed Storage System for Structured Data

Reading 9: Chubby Locking Service

Due: 12/5
Link: The Chubby Lock Service for Loosely-coupled Distributed Systems


Exam 1 study guide

Make sure to get this in to me before our exam (9/28 @ 8am). I will take whatever work you have done and apply it as a bonus quiz (will be out of 0 points). Guide can be found here: CSCI 385 Fall 17 Exam 1 Study Guide.

Exam 2 study guide

Make sure to get this in to me before our exam (10/31 @8am). I will take whatever work you have done and apply it as a bonus quiz (will be out of 0 points). Guide can be found here: CSCI 385 Fall 17 Exam 2 Study Guide.