comp10001-project01/project01

214 lines
10 KiB
Text

Project 1 is all about "social networks", and the power of social connections,
both in terms of how impressively large a portion of the social network can be
accessed from a small number of seed users and their friends or
friends-of-friends, and how accurately the attributes of an individual can be
predicted from (partial) attributes of their friends/friends-of-friends. A
large part of the context for the project is in illustrating how it is that
companies such as Cambridge Analytica are able to influence the world so
impressively, from a small set of users of their products.
Throughout the project, we will refer to individuals as "nodes" in the social
network, and (mutual) friendship connections as "edges" connecting those nodes.
See the lecture slides for more details.
--------------------------------------------------------------------------------
Part 1 - Friends
Write a function get_friendly_dict() that calculates the degree-one friends of
each individual in a social network. The function takes one argument:
- friend_list, a list of reciproal friendship links between individuals.
The function should return a dictionary of sets, containing the set of all
"degree-one" (= immediate) friends for each individual in the social network.
Note that the specific order of the individuals in the dictionary, and also the
ordering of the friends in each set does not matter.
The structure of friend_list is as follows: each element is a 2-tuple of
strings, representing a pairing of names of individuals in the social network
who are friends. Note that as friendship links are reciprocal, the 2-tuple
('kim', 'sandy'), e.g., indicates that 'kim' is a friend of 'sandy', and also
that 'sandy' is a friend of 'kim'.
Example function calls are:
>>> get_friendly_dict([('kim', 'sandy'), ('alex', 'sandy'),
('kim', 'alex'), ('kim', 'glenn')])
{'kim': {'glenn', 'sandy', 'alex'}, 'sandy': {'kim', 'alex'},
'alex': {'sandy', 'kim'}, 'glenn': {'kim'}}
>>> get_friendly_dict([('kim', 'sandy'), ('sandy', 'alex'),
('alex', 'glenn'), ('glenn', 'kim')])
{'kim': {'glenn', 'sandy'}, 'sandy': {'kim', 'alex'},
'alex': {'glenn', 'sandy'}, 'glenn': {'kim', 'alex'}}
--------------------------------------------------------------------------------
Part 2 - Social Network Besties
Write a function friend_besties() that calculates the "besties" (i.e.
degree-one friends) of a given individual in a social network. The function
takes two arguments:
- individual, an individual in the social network, in the form of a string ID
- bestie_dict, a dictionary of sets of friends of each individual in the
social network (as per the first question of the Project)
The function should return a sorted list, made up of all "degree-one" friends
for the individual. In the instance that the individual does not have any
friends in the social network, the function should return an empty list.
Example function calls are:
>>> friend_besties('kim', {'kim': {'sandy', 'alex', 'glenn'},
'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}})
['alex', 'glenn', 'sandy']
>>> friend_besties('ali', {'kim': {'sandy', 'alex', 'glenn'},
'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}})
[]
--------------------------------------------------------------------------------
Part 3 - Social Network Second Besties
Write a function friend_second_besties() that calculates the "second-besties"
(i.e. degree-two friends) of a given individual in a social network. The
function takes two arguments:
- individual, an individual in the social network, in the form of a string ID
- bestie_dict, a dictionary of sets of friends of each individual in the
social network (as per the first question of the Project)
The function should return a sorted list, made up of all "degree-two" friends
for the individual. In the instance that the individual does not have any
degree-two friends in the social network, the function should return an
empty list.
Example function calls are:
>>> friend_second_besties('glenn', {'kim': {'sandy', 'alex', 'glenn'},
'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}})
['alex', 'sandy']
>>> friend_second_besties('kim', {'kim': {'sandy', 'alex', 'glenn'},
'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}})
[]
--------------------------------------------------------------------------------
Part 4 - Network Coverage
Write a function besties_coverage() that computes the "coverage" of nodes
within a social network that are connected via predefined relationships to a
given list of individuals, i.e. the proportion of connected individuals, to the
total size of the network (= the number of people in the social network). The
function takes three arguments:
- individuals, a list of individuals, each in the form of a string ID
- bestie_dict, a dictionary of sets of friends of each individual in the
social network (as per the first question of the Project)
- relationship_list, a list of functions defining relationships in the
social network, selected from friend_besties and friend_second_besties.
The function should return a float, corresponding to the proportion of the
total number of individuals who are either a member of individuals or connected
via one of the relationships in relationship_list.
Example calls to the function are:
>>> besties_coverage(['glenn'], {'kim': {'sandy', 'alex', 'glenn'},
'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}},
[])
0.25
>>> besties_coverage(['glenn'], {'kim': {'sandy', 'alex', 'glenn'},
'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}},
[friend_besties])
0.5
>>> besties_coverage(['glenn'], {'kim': {'sandy', 'alex', 'glenn'},
'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}},
[friend_second_besties])
0.75
>>> besties_coverage(['glenn'], {'kim': {'sandy', 'alex', 'glenn'},
'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}},
[friend_besties, friend_second_besties])
1.0
--------------------------------------------------------------------------------
Part 5 - Social Network Attribute Prediction
The final question is for bonus marks, and is deliberately quite a bit harder
than the four basic questions (and the number of marks on offer is deliberately
not commensurate with the amount of effort required — bonus marks aren't meant
to be easy to get!). Only attempt this is you have completed the earlier
questions, and are up for a challenge!
The context for the bonus question is the prediction of attributes of a user
based on the attributes of their social network, and the observation that a
user's friends often have very similar interests and background to that user
(what is formally called homophily).
Write a function friendly_prediction() which takes four arguments:
- unknown_user, a string indicating the identity of the user you are to predict
attributes for
- features, a set of features you are to predict attributes for
- bestie_dict, a dictionary of sets of the besties for each user in the
dataset, following the same format as the earlier questions in the project
- feat_dict, a dictionary containing the known attributes for each user in the
training data, across a range of features; note that there is no guarantee
that the attribute for a given feature will be known for every training user
Your function should return a dictionary of features (based on features), with
a predicted list of values for each.
Your function should make its predictions as follows:
- first, identify the set of besties for the given user, and for each feature
of interest, determine the most-commonly attested attribute for that feature
among the besties; in the case of a tie, the prediction should be a sorted
list of attributes
- second, for any features where no bestie has an attribute for that feature
(meaning no prediction was possible in the first step), repeat the process
using the second-besties, once again in the form of a sorted list
of attributes
- in the case that no bestie or second-bestie has that attribute, return an
empty list.
Note that all attributes will take the form of strings, with the empty string
representing the fact that the user explicitly has no value for that feature
(e.g. if the user did not go to university, the value for university would be
''), and the lack of an attribute for a given feature indicating that the
attribute is unknown. Note further that even if the attribute for unknown_user
is available in feat_dict, you should predict based on the attributes of
besties and second besties.
Example calls to the function are:
>>> friendly_prediction('glenn', {'favourite author', 'university'},
{'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'},
'alex': {'kim', 'sandy'}, 'glenn': {'kim'}}, {'glenn':
{'university': ''}, 'kim': {'favourite author': 'AA Milne'}, 'sandy':
{'favourite author': 'JRR Tolkien',
"university": "University of Melbourne"}, 'alex': {'favourite author':
'AA Milne', 'university': 'Monash University'}})
{'university': ['Monash University', 'University of Melbourne'],
'favourite author': ['AA Milne']}
>>> friendly_prediction('kim', {'university'}, {'kim':
{'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex':
{'kim', 'sandy'}, 'glenn': {'kim'}}, {'glenn': {'university': ''},
'kim': {'favourite author': 'AA Milne'}, 'sandy': {'favourite author':
'JRR Tolkien', "university": "University of Melbourne"}, 'alex':
{'favourite author': 'AA Milne', 'university': 'Monash University'}})
{'university': ['', 'Monash University', 'University of Melbourne']}
>>> friendly_prediction('kim', {'birthplace'}, {'kim':
{'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex':
{'kim', 'sandy'}, 'glenn': {'kim'}}, {'glenn': {'university': ''},
'kim': {'favourite author': 'AA Milne'}, 'sandy': {'favourite author':
'JRR Tolkien', "university": "University of Melbourne"}, 'alex':
{'favourite author': 'AA Milne', 'university': 'Monash University'}})
{'birthplace': []}