1
0
Fork 0

Add specification

This commit is contained in:
Rory Healy 2023-01-30 21:32:22 +11:00
parent f7dc96d33f
commit 354e9afea6
Signed by: roryhealy
GPG Key ID: 610ED1098B8F435B
2 changed files with 200 additions and 3 deletions

View File

@ -1,5 +1,7 @@
# social-interest-predictor
A small tool that predicts the interests of a user based on the friends of that user in a social network.
Based on an assignment for COMP10001, a subject at The University of Melbourne.
A small tool that predicts the interests of a user based on the friends of that
user in a social network.
Based on an assignment for COMP10001, a subject at The University of Melbourne.
The assignment specification can be found [here](docs/SPECIFICATION.md).

195
docs/SPECIFICATION.md Normal file
View File

@ -0,0 +1,195 @@
# Assignment Specification
Below is the assignment specification, in full, slightly edited for context and
appearence.
---
## Introduction
This project is all about "social networks", and the power of social
connections, both in terms of how impressively large a portion of the social
network can be accessed from a small number of seed users and their friends or
friends-of-friends, and how accurately the attributes of an individual can be
predicted from (partial) attributes of their friends/friends-of-friends. A
large part of the context for the project is in illustrating how it is that
companies such as Cambridge Analytica are able to influence the world so
impressively, from 1a small set of users of their products.
Throughout the project, we will refer to individuals as "nodes" in the social
network, and (mutual) friendship connections as "edges" connecting those nodes.
## Part 1 - Friends
Write a function get_friendly_dict() that calculates the degree-one friends of
each individual in a social network. The function takes one argument:
- friend_list, a list of reciproal friendship links between individuals.
The function should return a dictionary of sets, containing the set of all
"degree-one" (immediate) friends for each individual in the social network.
Note that the specific order of the individuals in the dictionary, and also the
ordering of the friends in each set does not matter.
The structure of friend_list is as follows: each element is a 2-tuple of
strings, representing a pairing of names of individuals in the social network
who are friends. Note that as friendship links are reciprocal, the 2-tuple
('kim', 'sandy') indicates that 'kim' is a friend of 'sandy', and also that
'sandy' is a friend of 'kim'.
Example function calls are:
>>> get_friendly_dict([('kim', 'sandy'), ('alex', 'sandy'), ('kim', 'alex'), ('kim', 'glenn')])
{'kim': {'glenn', 'sandy', 'alex'}, 'sandy': {'kim', 'alex'}, 'alex': {'sandy', 'kim'}, 'glenn': {'kim'}}
>>> get_friendly_dict([('kim', 'sandy'), ('sandy', 'alex'), ('alex', 'glenn'), ('glenn', 'kim')])
{'kim': {'glenn', 'sandy'}, 'sandy': {'kim', 'alex'}, 'alex': {'glenn', 'sandy'}, 'glenn': {'kim', 'alex'}}
## Part 2 - Social Network Besties
Write a function friend_besties() that calculates the "besties" (i.e.,
degree-one friends) of a given individual in a social network. The function
takes two arguments:
- individual, an individual in the social network, in the form of a string ID;
- bestie_dict, a dictionary of sets of friends of each individual in the
social network.
The function should return a sorted list, made up of all "degree-one" friends
for the individual. In the instance that the individual does not have any
friends in the social network, the function should return an empty list.
Example function calls are:
>>> friend_besties('kim', {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}})
['alex', 'glenn', 'sandy']
>>> friend_besties('ali', {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}})
[]
## Part 3 - Social Network Second Besties
Write a function friend_second_besties() that calculates the "second-besties"
(i.e. degree-two friends) of a given individual in a social network. The
function takes two arguments:
- individual, an individual in the social network, in the form of a string ID;
- bestie_dict, a dictionary of sets of friends of each individual in the
social network.
The function should return a sorted list, made up of all "degree-two" friends
for the individual. In the instance that the individual does not have any
degree-two friends in the social network, the function should return an
empty list.
Example function calls are:
>>> friend_second_besties('glenn', {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}})
['alex', 'sandy']
>>> friend_second_besties('kim', {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}})
[]
## Part 4 - Network Coverage
Write a function besties_coverage() that computes the "coverage" of nodes
within a social network that are connected via predefined relationships to a
given list of individuals (i.e., the proportion of connected individuals, to the
total size of the network, which is the total number of people in the social
network). The function takes three arguments:
- individuals, a list of individuals, each in the form of a string ID;
- bestie_dict, a dictionary of sets of friends of each individual in the
social network;
- relationship_list, a list of functions defining relationships in the
social network, selected from friend_besties and friend_second_besties.
The function should return a float, corresponding to the proportion of the
total number of individuals who are either a member of individuals or connected
via one of the relationships in relationship_list.
Example calls to the function are:
>>> besties_coverage(['glenn'], {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}}, [])
0.25
>>> besties_coverage(['glenn'], {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}}, [friend_besties])
0.5
>>> besties_coverage(['glenn'], {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}}, [friend_second_besties])
0.75
>>> besties_coverage(['glenn'], {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}}, [friend_besties, friend_second_besties])
1.0
## Part 5 - Social Network Attribute Prediction
The final question is for bonus marks, and is deliberately quite a bit harder
than the four basic questions (and the number of marks on offer is deliberately
not commensurate with the amount of effort required — bonus marks aren't meant
to be easy to get!). Only attempt this is you have completed the earlier
questions, and are up for a challenge!
The context for the bonus question is the prediction of attributes of a user
based on the attributes of their social network, and the observation that a
user's friends often have very similar interests and background to that user
(what is formally called homophily).
Write a function friendly_prediction() which takes four arguments:
- unknown_user, a string indicating the identity of the user you are to predict
attributes for;
- features, a set of features you are to predict attributes for;
- bestie_dict, a dictionary of sets of the besties for each user in the
dataset, following the same format as the earlier questions in the project;
- feat_dict, a dictionary containing the known attributes for each user in the
training data, across a range of features; note that there is no guarantee
that the attribute for a given feature will be known for every training user.
Your function should return a dictionary of features (based on features), with
a predicted list of values for each.
Your function should make its predictions as follows:
- First, identify the set of besties for the given user, and for each feature
of interest, determine the most-commonly attested attribute for that feature
among the besties. In the case of a tie, the prediction should be a sorted
list of attributes.
- Second, for any features where no bestie has an attribute for that feature
(meaning no prediction was possible in the first step), repeat the process
using the second-besties, once again in the form of a sorted list
of attributes.
- In the case that no bestie or second-bestie has that attribute, return an
empty list.
Note that all attributes will take the form of strings, with the empty string
representing the fact that the user explicitly has no value for that feature
(e.g., if the user did not go to university, the value for university would be
''), and the lack of an attribute for a given feature indicating that the
attribute is unknown. Note further that even if the attribute for unknown_user
is available in feat_dict, you should predict based on the attributes of
besties and second besties.
Example calls to the function are:
>>> friendly_prediction('glenn', {'favourite author', 'university'}, {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}}, {'glenn': {'university': ''}, 'kim': {'favourite author': 'AA Milne'}, 'sandy': {'favourite author': 'JRR Tolkien', "university": "University of Melbourne"}, 'alex': {'favourite author': 'AA Milne', 'university': 'Monash University'}})
{'university': ['Monash University', 'University of Melbourne'], 'favourite author': ['AA Milne']}
>>> friendly_prediction('kim', {'university'}, {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}}, {'glenn': {'university': ''}, 'kim': {'favourite author': 'AA Milne'}, 'sandy': {'favourite author': 'JRR Tolkien', "university": "University of Melbourne"}, 'alex': {'favourite author': 'AA Milne', 'university': 'Monash University'}})
{'university': ['', 'Monash University', 'University of Melbourne']}
>>> friendly_prediction('kim', {'birthplace'}, {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}}, {'glenn': {'university': ''}, 'kim': {'favourite author': 'AA Milne'}, 'sandy': {'favourite author': 'JRR Tolkien', "university": "University of Melbourne"}, 'alex': {'favourite author': 'AA Milne', 'university': 'Monash University'}})
{'birthplace': []}