1
0
Fork 0
social-interest-predictor/docs/SPECIFICATION.md

9.6 KiB

Assignment Specification

Below is the assignment specification, in full, slightly edited for context and appearence.

Introduction

This project is all about "social networks", and the power of social connections, both in terms of how impressively large a portion of the social network can be accessed from a small number of seed users and their friends or friends-of-friends, and how accurately the attributes of an individual can be predicted from (partial) attributes of their friends/friends-of-friends. A large part of the context for the project is in illustrating how it is that companies such as Cambridge Analytica are able to influence the world so impressively, from 1a small set of users of their products.

Throughout the project, we will refer to individuals as "nodes" in the social network, and (mutual) friendship connections as "edges" connecting those nodes.

Part 1 - Friends

Write a function get_friendly_dict() that calculates the degree-one friends of each individual in a social network. The function takes one argument:

  • friend_list, a list of reciproal friendship links between individuals.

The function should return a dictionary of sets, containing the set of all "degree-one" (immediate) friends for each individual in the social network. Note that the specific order of the individuals in the dictionary, and also the ordering of the friends in each set does not matter.

The structure of friend_list is as follows: each element is a 2-tuple of strings, representing a pairing of names of individuals in the social network who are friends. Note that as friendship links are reciprocal, the 2-tuple ('kim', 'sandy') indicates that 'kim' is a friend of 'sandy', and also that 'sandy' is a friend of 'kim'.

Example function calls are:

>>> get_friendly_dict([('kim', 'sandy'), ('alex', 'sandy'), ('kim', 'alex'), ('kim', 'glenn')])

{'kim': {'glenn', 'sandy', 'alex'}, 'sandy': {'kim', 'alex'}, 'alex': {'sandy', 'kim'}, 'glenn': {'kim'}}

>>> get_friendly_dict([('kim', 'sandy'), ('sandy', 'alex'), ('alex', 'glenn'), ('glenn', 'kim')])

{'kim': {'glenn', 'sandy'}, 'sandy': {'kim', 'alex'}, 'alex': {'glenn', 'sandy'}, 'glenn': {'kim', 'alex'}}

Part 2 - Social Network Besties

Write a function friend_besties() that calculates the "besties" (i.e., degree-one friends) of a given individual in a social network. The function takes two arguments:

  • individual, an individual in the social network, in the form of a string ID;
  • bestie_dict, a dictionary of sets of friends of each individual in the social network.

The function should return a sorted list, made up of all "degree-one" friends for the individual. In the instance that the individual does not have any friends in the social network, the function should return an empty list.

Example function calls are:

>>> friend_besties('kim', {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}})

['alex', 'glenn', 'sandy']

>>> friend_besties('ali', {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}})

[]

Part 3 - Social Network Second Besties

Write a function friend_second_besties() that calculates the "second-besties" (i.e. degree-two friends) of a given individual in a social network. The function takes two arguments:

  • individual, an individual in the social network, in the form of a string ID;
  • bestie_dict, a dictionary of sets of friends of each individual in the social network.

The function should return a sorted list, made up of all "degree-two" friends for the individual. In the instance that the individual does not have any degree-two friends in the social network, the function should return an empty list.

Example function calls are:

>>> friend_second_besties('glenn', {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}})

['alex', 'sandy']

>>> friend_second_besties('kim', {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}})

[]

Part 4 - Network Coverage

Write a function besties_coverage() that computes the "coverage" of nodes within a social network that are connected via predefined relationships to a given list of individuals (i.e., the proportion of connected individuals, to the total size of the network, which is the total number of people in the social network). The function takes three arguments:

  • individuals, a list of individuals, each in the form of a string ID;
  • bestie_dict, a dictionary of sets of friends of each individual in the social network;
  • relationship_list, a list of functions defining relationships in the social network, selected from friend_besties and friend_second_besties.

The function should return a float, corresponding to the proportion of the total number of individuals who are either a member of individuals or connected via one of the relationships in relationship_list.

Example calls to the function are:

>>> besties_coverage(['glenn'], {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}}, [])

0.25

>>> besties_coverage(['glenn'], {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}}, [friend_besties])

0.5

>>> besties_coverage(['glenn'], {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}}, [friend_second_besties])

0.75

>>> besties_coverage(['glenn'], {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}}, [friend_besties, friend_second_besties])

1.0

Part 5 - Social Network Attribute Prediction

The final question is for bonus marks, and is deliberately quite a bit harder than the four basic questions (and the number of marks on offer is deliberately not commensurate with the amount of effort required — bonus marks aren't meant to be easy to get!). Only attempt this is you have completed the earlier questions, and are up for a challenge!

The context for the bonus question is the prediction of attributes of a user based on the attributes of their social network, and the observation that a user's friends often have very similar interests and background to that user (what is formally called homophily).

Write a function friendly_prediction() which takes four arguments:

  • unknown_user, a string indicating the identity of the user you are to predict attributes for;
  • features, a set of features you are to predict attributes for;
  • bestie_dict, a dictionary of sets of the besties for each user in the dataset, following the same format as the earlier questions in the project;
  • feat_dict, a dictionary containing the known attributes for each user in the training data, across a range of features; note that there is no guarantee that the attribute for a given feature will be known for every training user.

Your function should return a dictionary of features (based on features), with a predicted list of values for each.

Your function should make its predictions as follows:

  • First, identify the set of besties for the given user, and for each feature of interest, determine the most-commonly attested attribute for that feature among the besties. In the case of a tie, the prediction should be a sorted list of attributes.
  • Second, for any features where no bestie has an attribute for that feature (meaning no prediction was possible in the first step), repeat the process using the second-besties, once again in the form of a sorted list of attributes.
  • In the case that no bestie or second-bestie has that attribute, return an empty list.

Note that all attributes will take the form of strings, with the empty string representing the fact that the user explicitly has no value for that feature (e.g., if the user did not go to university, the value for university would be ''), and the lack of an attribute for a given feature indicating that the attribute is unknown. Note further that even if the attribute for unknown_user is available in feat_dict, you should predict based on the attributes of besties and second besties.

Example calls to the function are:

>>> friendly_prediction('glenn', {'favourite author', 'university'}, {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}}, {'glenn': {'university': ''}, 'kim': {'favourite author': 'AA Milne'}, 'sandy': {'favourite author': 'JRR Tolkien', "university": "University of Melbourne"}, 'alex': {'favourite author': 'AA Milne', 'university': 'Monash University'}})

{'university': ['Monash University', 'University of Melbourne'], 'favourite author': ['AA Milne']}

>>> friendly_prediction('kim', {'university'}, {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}}, {'glenn': {'university': ''}, 'kim': {'favourite author': 'AA Milne'}, 'sandy': {'favourite author': 'JRR Tolkien', "university": "University of Melbourne"}, 'alex': {'favourite author': 'AA Milne', 'university': 'Monash University'}})

{'university': ['', 'Monash University', 'University of Melbourne']}

>>> friendly_prediction('kim', {'birthplace'}, {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}}, {'glenn': {'university': ''}, 'kim': {'favourite author': 'AA Milne'}, 'sandy': {'favourite author': 'JRR Tolkien', "university": "University of Melbourne"}, 'alex': {'favourite author': 'AA Milne', 'university': 'Monash University'}})

{'birthplace': []}