Add specification
This commit is contained in:
parent
f7dc96d33f
commit
354e9afea6
|
@ -1,5 +1,7 @@
|
|||
# social-interest-predictor
|
||||
|
||||
A small tool that predicts the interests of a user based on the friends of that user in a social network.
|
||||
|
||||
Based on an assignment for COMP10001, a subject at The University of Melbourne.
|
||||
A small tool that predicts the interests of a user based on the friends of that
|
||||
user in a social network.
|
||||
|
||||
Based on an assignment for COMP10001, a subject at The University of Melbourne.
|
||||
The assignment specification can be found [here](docs/SPECIFICATION.md).
|
||||
|
|
|
@ -0,0 +1,195 @@
|
|||
# Assignment Specification
|
||||
|
||||
Below is the assignment specification, in full, slightly edited for context and
|
||||
appearence.
|
||||
|
||||
---
|
||||
|
||||
## Introduction
|
||||
|
||||
This project is all about "social networks", and the power of social
|
||||
connections, both in terms of how impressively large a portion of the social
|
||||
network can be accessed from a small number of seed users and their friends or
|
||||
friends-of-friends, and how accurately the attributes of an individual can be
|
||||
predicted from (partial) attributes of their friends/friends-of-friends. A
|
||||
large part of the context for the project is in illustrating how it is that
|
||||
companies such as Cambridge Analytica are able to influence the world so
|
||||
impressively, from 1a small set of users of their products.
|
||||
|
||||
Throughout the project, we will refer to individuals as "nodes" in the social
|
||||
network, and (mutual) friendship connections as "edges" connecting those nodes.
|
||||
|
||||
## Part 1 - Friends
|
||||
|
||||
Write a function get_friendly_dict() that calculates the degree-one friends of
|
||||
each individual in a social network. The function takes one argument:
|
||||
|
||||
- friend_list, a list of reciproal friendship links between individuals.
|
||||
|
||||
The function should return a dictionary of sets, containing the set of all
|
||||
"degree-one" (immediate) friends for each individual in the social network.
|
||||
Note that the specific order of the individuals in the dictionary, and also the
|
||||
ordering of the friends in each set does not matter.
|
||||
|
||||
The structure of friend_list is as follows: each element is a 2-tuple of
|
||||
strings, representing a pairing of names of individuals in the social network
|
||||
who are friends. Note that as friendship links are reciprocal, the 2-tuple
|
||||
('kim', 'sandy') indicates that 'kim' is a friend of 'sandy', and also that
|
||||
'sandy' is a friend of 'kim'.
|
||||
|
||||
Example function calls are:
|
||||
|
||||
>>> get_friendly_dict([('kim', 'sandy'), ('alex', 'sandy'), ('kim', 'alex'), ('kim', 'glenn')])
|
||||
|
||||
{'kim': {'glenn', 'sandy', 'alex'}, 'sandy': {'kim', 'alex'}, 'alex': {'sandy', 'kim'}, 'glenn': {'kim'}}
|
||||
|
||||
>>> get_friendly_dict([('kim', 'sandy'), ('sandy', 'alex'), ('alex', 'glenn'), ('glenn', 'kim')])
|
||||
|
||||
{'kim': {'glenn', 'sandy'}, 'sandy': {'kim', 'alex'}, 'alex': {'glenn', 'sandy'}, 'glenn': {'kim', 'alex'}}
|
||||
|
||||
## Part 2 - Social Network Besties
|
||||
|
||||
Write a function friend_besties() that calculates the "besties" (i.e.,
|
||||
degree-one friends) of a given individual in a social network. The function
|
||||
takes two arguments:
|
||||
|
||||
- individual, an individual in the social network, in the form of a string ID;
|
||||
- bestie_dict, a dictionary of sets of friends of each individual in the
|
||||
social network.
|
||||
|
||||
The function should return a sorted list, made up of all "degree-one" friends
|
||||
for the individual. In the instance that the individual does not have any
|
||||
friends in the social network, the function should return an empty list.
|
||||
|
||||
Example function calls are:
|
||||
|
||||
>>> friend_besties('kim', {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}})
|
||||
|
||||
['alex', 'glenn', 'sandy']
|
||||
|
||||
>>> friend_besties('ali', {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}})
|
||||
|
||||
[]
|
||||
|
||||
## Part 3 - Social Network Second Besties
|
||||
|
||||
Write a function friend_second_besties() that calculates the "second-besties"
|
||||
(i.e. degree-two friends) of a given individual in a social network. The
|
||||
function takes two arguments:
|
||||
|
||||
- individual, an individual in the social network, in the form of a string ID;
|
||||
- bestie_dict, a dictionary of sets of friends of each individual in the
|
||||
social network.
|
||||
|
||||
The function should return a sorted list, made up of all "degree-two" friends
|
||||
for the individual. In the instance that the individual does not have any
|
||||
degree-two friends in the social network, the function should return an
|
||||
empty list.
|
||||
|
||||
Example function calls are:
|
||||
|
||||
>>> friend_second_besties('glenn', {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}})
|
||||
|
||||
['alex', 'sandy']
|
||||
|
||||
>>> friend_second_besties('kim', {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}})
|
||||
|
||||
[]
|
||||
|
||||
## Part 4 - Network Coverage
|
||||
|
||||
Write a function besties_coverage() that computes the "coverage" of nodes
|
||||
within a social network that are connected via predefined relationships to a
|
||||
given list of individuals (i.e., the proportion of connected individuals, to the
|
||||
total size of the network, which is the total number of people in the social
|
||||
network). The function takes three arguments:
|
||||
|
||||
- individuals, a list of individuals, each in the form of a string ID;
|
||||
- bestie_dict, a dictionary of sets of friends of each individual in the
|
||||
social network;
|
||||
- relationship_list, a list of functions defining relationships in the
|
||||
social network, selected from friend_besties and friend_second_besties.
|
||||
|
||||
The function should return a float, corresponding to the proportion of the
|
||||
total number of individuals who are either a member of individuals or connected
|
||||
via one of the relationships in relationship_list.
|
||||
|
||||
Example calls to the function are:
|
||||
|
||||
>>> besties_coverage(['glenn'], {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}}, [])
|
||||
|
||||
0.25
|
||||
|
||||
>>> besties_coverage(['glenn'], {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}}, [friend_besties])
|
||||
|
||||
0.5
|
||||
|
||||
>>> besties_coverage(['glenn'], {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}}, [friend_second_besties])
|
||||
|
||||
0.75
|
||||
|
||||
>>> besties_coverage(['glenn'], {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}}, [friend_besties, friend_second_besties])
|
||||
|
||||
1.0
|
||||
|
||||
## Part 5 - Social Network Attribute Prediction
|
||||
|
||||
The final question is for bonus marks, and is deliberately quite a bit harder
|
||||
than the four basic questions (and the number of marks on offer is deliberately
|
||||
not commensurate with the amount of effort required — bonus marks aren't meant
|
||||
to be easy to get!). Only attempt this is you have completed the earlier
|
||||
questions, and are up for a challenge!
|
||||
|
||||
The context for the bonus question is the prediction of attributes of a user
|
||||
based on the attributes of their social network, and the observation that a
|
||||
user's friends often have very similar interests and background to that user
|
||||
(what is formally called homophily).
|
||||
|
||||
Write a function friendly_prediction() which takes four arguments:
|
||||
|
||||
- unknown_user, a string indicating the identity of the user you are to predict
|
||||
attributes for;
|
||||
- features, a set of features you are to predict attributes for;
|
||||
- bestie_dict, a dictionary of sets of the besties for each user in the
|
||||
dataset, following the same format as the earlier questions in the project;
|
||||
- feat_dict, a dictionary containing the known attributes for each user in the
|
||||
training data, across a range of features; note that there is no guarantee
|
||||
that the attribute for a given feature will be known for every training user.
|
||||
|
||||
Your function should return a dictionary of features (based on features), with
|
||||
a predicted list of values for each.
|
||||
|
||||
Your function should make its predictions as follows:
|
||||
|
||||
- First, identify the set of besties for the given user, and for each feature
|
||||
of interest, determine the most-commonly attested attribute for that feature
|
||||
among the besties. In the case of a tie, the prediction should be a sorted
|
||||
list of attributes.
|
||||
- Second, for any features where no bestie has an attribute for that feature
|
||||
(meaning no prediction was possible in the first step), repeat the process
|
||||
using the second-besties, once again in the form of a sorted list
|
||||
of attributes.
|
||||
- In the case that no bestie or second-bestie has that attribute, return an
|
||||
empty list.
|
||||
|
||||
Note that all attributes will take the form of strings, with the empty string
|
||||
representing the fact that the user explicitly has no value for that feature
|
||||
(e.g., if the user did not go to university, the value for university would be
|
||||
''), and the lack of an attribute for a given feature indicating that the
|
||||
attribute is unknown. Note further that even if the attribute for unknown_user
|
||||
is available in feat_dict, you should predict based on the attributes of
|
||||
besties and second besties.
|
||||
|
||||
Example calls to the function are:
|
||||
|
||||
>>> friendly_prediction('glenn', {'favourite author', 'university'}, {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}}, {'glenn': {'university': ''}, 'kim': {'favourite author': 'AA Milne'}, 'sandy': {'favourite author': 'JRR Tolkien', "university": "University of Melbourne"}, 'alex': {'favourite author': 'AA Milne', 'university': 'Monash University'}})
|
||||
|
||||
{'university': ['Monash University', 'University of Melbourne'], 'favourite author': ['AA Milne']}
|
||||
|
||||
>>> friendly_prediction('kim', {'university'}, {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}}, {'glenn': {'university': ''}, 'kim': {'favourite author': 'AA Milne'}, 'sandy': {'favourite author': 'JRR Tolkien', "university": "University of Melbourne"}, 'alex': {'favourite author': 'AA Milne', 'university': 'Monash University'}})
|
||||
|
||||
{'university': ['', 'Monash University', 'University of Melbourne']}
|
||||
|
||||
>>> friendly_prediction('kim', {'birthplace'}, {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}}, {'glenn': {'university': ''}, 'kim': {'favourite author': 'AA Milne'}, 'sandy': {'favourite author': 'JRR Tolkien', "university": "University of Melbourne"}, 'alex': {'favourite author': 'AA Milne', 'university': 'Monash University'}})
|
||||
|
||||
{'birthplace': []}
|
Loading…
Reference in New Issue