Add specification

2023-01-30 21:32:22 +11:00 · 2023-01-30 21:32:22 +11:00 · 354e9afea6
parent f7dc96d33f
commit 354e9afea6
2 changed files with 200 additions and 3 deletions
--- a/README.md
+++ b/README.md
@ -1,5 +1,7 @@
 # social-interest-predictor

-A small tool that predicts the interests of a user based on the friends of that user in a social network.
-
-Based on an assignment for COMP10001, a subject at The University of Melbourne.
+A small tool that predicts the interests of a user based on the friends of that
+user in a social network.
+
+Based on an assignment for COMP10001, a subject at The University of Melbourne.
+The assignment specification can be found [here](docs/SPECIFICATION.md).
--- a/docs/SPECIFICATION.md
+++ b/docs/SPECIFICATION.md
@ -0,0 +1,195 @@
+# Assignment Specification
+
+Below is the assignment specification, in full, slightly edited for context and
+appearence.
+
+---
+
+## Introduction
+
+This project is all about "social networks", and the power of social
+connections, both in terms of how impressively large a portion of the social
+network can be accessed from a small number of seed users and their friends or
+friends-of-friends, and how accurately the attributes of an individual can be
+predicted from (partial) attributes of their friends/friends-of-friends. A
+large part of the context for the project is in illustrating how it is that
+companies such as Cambridge Analytica are able to influence the world so
+impressively, from 1a small set of users of their products.
+
+Throughout the project, we will refer to individuals as "nodes" in the social
+network, and (mutual) friendship connections as "edges" connecting those nodes.
+
+## Part 1 - Friends
+
+Write a function get_friendly_dict() that calculates the degree-one friends of
+each individual in a social network. The function takes one argument:
+
+- friend_list, a list of reciproal friendship links between individuals.
+
+The function should return a dictionary of sets, containing the set of all
+"degree-one" (immediate) friends for each individual in the social network.
+Note that the specific order of the individuals in the dictionary, and also the
+ordering of the friends in each set does not matter.
+
+The structure of friend_list is as follows: each element is a 2-tuple of
+strings, representing a pairing of names of individuals in the social network
+who are friends. Note that as friendship links are reciprocal, the 2-tuple
+('kim', 'sandy') indicates that 'kim' is a friend of 'sandy', and also that
+'sandy' is a friend of 'kim'.
+
+Example function calls are:
+
+    >>> get_friendly_dict([('kim', 'sandy'), ('alex', 'sandy'), ('kim', 'alex'), ('kim', 'glenn')])
+
+    {'kim': {'glenn', 'sandy', 'alex'}, 'sandy': {'kim', 'alex'}, 'alex': {'sandy', 'kim'}, 'glenn': {'kim'}}
+
+    >>> get_friendly_dict([('kim', 'sandy'), ('sandy', 'alex'), ('alex', 'glenn'), ('glenn', 'kim')])
+
+    {'kim': {'glenn', 'sandy'}, 'sandy': {'kim', 'alex'}, 'alex': {'glenn', 'sandy'}, 'glenn': {'kim', 'alex'}}
+
+## Part 2 - Social Network Besties
+
+Write a function friend_besties() that calculates the "besties" (i.e.,
+degree-one friends) of a given individual in a social network. The function
+takes two arguments:
+
+- individual, an individual in the social network, in the form of a string ID;
+- bestie_dict, a dictionary of sets of friends of each individual in the
+  social network.
+
+The function should return a sorted list, made up of all "degree-one" friends
+for the individual. In the instance that the individual does not have any
+friends in the social network, the function should return an empty list.
+
+Example function calls are:
+
+    >>> friend_besties('kim', {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}})
+
+    ['alex', 'glenn', 'sandy']
+
+    >>> friend_besties('ali', {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}})
+
+    []
+
+## Part 3 - Social Network Second Besties
+
+Write a function friend_second_besties() that calculates the "second-besties"
+(i.e. degree-two friends) of a given individual in a social network. The
+function takes two arguments:
+
+- individual, an individual in the social network, in the form of a string ID;
+- bestie_dict, a dictionary of sets of friends of each individual in the
+  social network.
+
+The function should return a sorted list, made up of all "degree-two" friends
+for the individual. In the instance that the individual does not have any
+degree-two friends in the social network, the function should return an
+empty list.
+
+Example function calls are:
+
+    >>> friend_second_besties('glenn', {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}})
+
+    ['alex', 'sandy']
+
+    >>> friend_second_besties('kim', {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}})
+
+    []
+
+## Part 4 - Network Coverage
+
+Write a function besties_coverage() that computes the "coverage" of nodes
+within a social network that are connected via predefined relationships to a
+given list of individuals (i.e., the proportion of connected individuals, to the
+total size of the network, which is the total number of people in the social
+network). The function takes three arguments:
+
+- individuals, a list of individuals, each in the form of a string ID;
+- bestie_dict, a dictionary of sets of friends of each individual in the
+  social network;
+- relationship_list, a list of functions defining relationships in the
+  social network, selected from friend_besties and friend_second_besties.
+
+The function should return a float, corresponding to the proportion of the
+total number of individuals who are either a member of individuals or connected
+via one of the relationships in relationship_list.
+
+Example calls to the function are:
+
+    >>> besties_coverage(['glenn'], {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}}, [])
+
+    0.25
+
+    >>> besties_coverage(['glenn'], {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}}, [friend_besties])
+
+    0.5
+
+    >>> besties_coverage(['glenn'], {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}}, [friend_second_besties])
+
+    0.75
+
+    >>> besties_coverage(['glenn'], {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}}, [friend_besties, friend_second_besties])
+
+    1.0
+
+## Part 5 - Social Network Attribute Prediction
+
+The final question is for bonus marks, and is deliberately quite a bit harder
+than the four basic questions (and the number of marks on offer is deliberately
+not commensurate with the amount of effort required — bonus marks aren't meant
+to be easy to get!). Only attempt this is you have completed the earlier
+questions, and are up for a challenge!
+
+The context for the bonus question is the prediction of attributes of a user
+based on the attributes of their social network, and the observation that a
+user's friends often have very similar interests and background to that user
+(what is formally called homophily).
+
+Write a function friendly_prediction() which takes four arguments:
+
+- unknown_user, a string indicating the identity of the user you are to predict
+  attributes for;
+- features, a set of features you are to predict attributes for;
+- bestie_dict, a dictionary of sets of the besties for each user in the
+  dataset, following the same format as the earlier questions in the project;
+- feat_dict, a dictionary containing the known attributes for each user in the
+  training data, across a range of features; note that there is no guarantee
+  that the attribute for a given feature will be known for every training user.
+
+Your function should return a dictionary of features (based on features), with
+a predicted list of values for each.
+
+Your function should make its predictions as follows:
+
+- First, identify the set of besties for the given user, and for each feature
+  of interest, determine the most-commonly attested attribute for that feature
+  among the besties. In the case of a tie, the prediction should be a sorted
+  list of attributes.
+- Second, for any features where no bestie has an attribute for that feature
+  (meaning no prediction was possible in the first step), repeat the process
+  using the second-besties, once again in the form of a sorted list
+  of attributes.
+- In the case that no bestie or second-bestie has that attribute, return an
+  empty list.
+
+Note that all attributes will take the form of strings, with the empty string
+representing the fact that the user explicitly has no value for that feature
+(e.g., if the user did not go to university, the value for university would be
+''), and the lack of an attribute for a given feature indicating that the
+attribute is unknown. Note further that even if the attribute for unknown_user
+is available in feat_dict, you should predict based on the attributes of
+besties and second besties.
+
+Example calls to the function are:
+
+    >>> friendly_prediction('glenn', {'favourite author', 'university'}, {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}}, {'glenn': {'university': ''}, 'kim': {'favourite author': 'AA Milne'}, 'sandy': {'favourite author': 'JRR Tolkien', "university": "University of Melbourne"}, 'alex': {'favourite author': 'AA Milne', 'university': 'Monash University'}})
+
+    {'university': ['Monash University', 'University of Melbourne'], 'favourite author': ['AA Milne']}
+
+    >>> friendly_prediction('kim', {'university'}, {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}}, {'glenn': {'university': ''}, 'kim': {'favourite author': 'AA Milne'}, 'sandy': {'favourite author': 'JRR Tolkien', "university": "University of Melbourne"}, 'alex': {'favourite author': 'AA Milne', 'university': 'Monash University'}})
+
+    {'university': ['', 'Monash University', 'University of Melbourne']}
+
+    >>> friendly_prediction('kim', {'birthplace'}, {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}}, {'glenn': {'university': ''}, 'kim': {'favourite author': 'AA Milne'}, 'sandy': {'favourite author': 'JRR Tolkien', "university": "University of Melbourne"}, 'alex': {'favourite author': 'AA Milne', 'university': 'Monash University'}})
+
+    {'birthplace': []}