Study Group Map Reducer Code

Analysis the each forum thread would give us a list of students that have posted there – either asked the question, answered a question or added a comment.
forum csv file contains
“id”    “title”    “tagnames”    “author_id”    “body”    “node_type”    “parent_id”    “abs_parent_id”    “added_at”    “score”    “state_string”    “last_edited_id”    “last_activity_by_id”    “last_activity_at”    “active_revision_id”    “extra”    “extra_ref_id”    “extra_count”    “marked”

Mapper result Reducer result
111    100000066
15084    100004819
2    100000005
3778    100000066
3778    100008254
66193    100007808
66193    100004467
66193    100071170
66193    100002460
Question Node ID    Student IDs
111      [100000066]
15084      [100004819]
2      [100000005]
3778      [100000066, 100008254]
66193      [100007808, 100004467, 100071170,100002460]
import sys
import csv
import collections

def mapper():
    reader = csv.reader(sys.stdin,delimiter='\t')
    for line in reader:
        id= line[0]
        author_id= line[3]
        parent_id= line[6]
        print id, "\t", author_id
        if parent_id != "\N":
            print parent_id, "\t", author_id

def reducer():
    oldNodeID = None
    studentid = collections.defaultdict(list)
    reader = csv.reader(sys.stdin,delimiter='\t')
    print "Question Node ID\tStudent IDs"
    for line in reader:
        newNodeID, newStudentID = line
        if len(line)!=2:
        if oldNodeID and oldNodeID !=  newNodeID:
            print oldNodeID,"\t", sorted(studentid[oldNodeID])
        oldNodeID = newNodeID
    if oldNodeID:
            print oldNodeID,"\t", sorted(studentid[oldNodeID])

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: