comp10001-project01/docs/SPECIFICATION.md

# Assignment Specification

Below is the assignment specification, in full, slightly edited for context and
appearence.

## Introduction

This project is all about "social networks", and the power of social
connections, both in terms of how impressively large a portion of the social
network can be accessed from a small number of seed users and their friends or
friends-of-friends, and how accurately the attributes of an individual can be
predicted from (partial) attributes of their friends/friends-of-friends. A
large part of the context for the project is in illustrating how it is that
companies such as Cambridge Analytica are able to influence the world so
impressively, from 1a small set of users of their products.

Throughout the project, we will refer to individuals as "nodes" in the social
network, and (mutual) friendship connections as "edges" connecting those nodes.

## Part 1 - Friends

Write a function `get_friendly_dict()` that calculates the degree-one friends of
each individual in a social network. The function takes one argument:

- `friend_list`, a list of reciproal friendship links between individuals.

The function should return a dictionary of sets, containing the set of all
"degree-one" (immediate) friends for each individual in the social network.
Note that the specific order of the individuals in the dictionary, and also the
ordering of the friends in each set does not matter.

The structure of `friend_list` is as follows: each element is a 2-tuple of
strings, representing a pairing of names of individuals in the social network
who are friends. Note that as friendship links are reciprocal, the 2-tuple
('kim', 'sandy') indicates that 'kim' is a friend of 'sandy', and also that
'sandy' is a friend of 'kim'.

Example function calls are:

    >>> get_friendly_dict([('kim', 'sandy'), ('alex', 'sandy'), ('kim', 'alex'), ('kim', 'glenn')])

    {'kim': {'glenn', 'sandy', 'alex'}, 'sandy': {'kim', 'alex'}, 'alex': {'sandy', 'kim'}, 'glenn': {'kim'}}

    >>> get_friendly_dict([('kim', 'sandy'), ('sandy', 'alex'), ('alex', 'glenn'), ('glenn', 'kim')])

    {'kim': {'glenn', 'sandy'}, 'sandy': {'kim', 'alex'}, 'alex': {'glenn', 'sandy'}, 'glenn': {'kim', 'alex'}}

## Part 2 - Social Network Besties

Write a function `friend_besties()` that calculates the "besties" (i.e.,
degree-one friends) of a given individual in a social network. The function
takes two arguments:

- `individual`, an individual in the social network, in the form of a string ID;
- `bestie_dict`, a dictionary of sets of friends of each individual in the
  social network.

The function should return a sorted list, made up of all "degree-one" friends
for the individual. In the instance that the individual does not have any
friends in the social network, the function should return an empty list.

Example function calls are:

```python
>>> friend_besties('kim', {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}})

['alex', 'glenn', 'sandy']

>>> friend_besties('ali', {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}})

[]
```

## Part 3 - Social Network Second Besties

Write a function `friend_second_besties()` that calculates the "second-besties"
(i.e. degree-two friends) of a given individual in a social network. The
function takes two arguments:

- `individual`, an individual in the social network, in the form of a string ID;
- `bestie_dict`, a dictionary of sets of friends of each individual in the
  social network.

The function should return a sorted list, made up of all "degree-two" friends
for the individual. In the instance that the individual does not have any
degree-two friends in the social network, the function should return an
empty list.

Example function calls are:

```python
>>> friend_second_besties('glenn', {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}})

['alex', 'sandy']

>>> friend_second_besties('kim', {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}})

[]
```

## Part 4 - Network Coverage

Write a function `besties_coverage()` that computes the "coverage" of nodes
within a social network that are connected via predefined relationships to a
given list of individuals (i.e., the proportion of connected individuals, to the
total size of the network, which is the total number of people in the social
network). The function takes three arguments:

- `individuals`, a list of individuals, each in the form of a string ID;
- `bestie_dict`, a dictionary of sets of friends of each individual in the
  social network;
- `relationship_list`, a list of functions defining relationships in the
  social network, selected from friend_besties and friend_second_besties.

The function should return a float, corresponding to the proportion of the
total number of individuals who are either a member of individuals or connected
via one of the relationships in `relationship_list`.

Example calls to the function are:

```python
>>> besties_coverage(['glenn'], {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}}, [])

0.25

>>> besties_coverage(['glenn'], {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}}, [friend_besties])

0.5

>>> besties_coverage(['glenn'], {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}}, [friend_second_besties])

0.75

>>> besties_coverage(['glenn'], {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}}, [friend_besties, friend_second_besties])

1.0
```

## Part 5 - Social Network Attribute Prediction

The final question is for bonus marks, and is deliberately quite a bit harder
than the four basic questions (and the number of marks on offer is deliberately
not commensurate with the amount of effort required — bonus marks aren't meant
to be easy to get!). Only attempt this is you have completed the earlier
questions, and are up for a challenge!

The context for the bonus question is the prediction of attributes of a user
based on the attributes of their social network, and the observation that a
user's friends often have very similar interests and background to that user
(what is formally called homophily).

Write a function `friendly_prediction()` which takes four arguments:

- `unknown_user`, a string indicating the identity of the user you are to
  predict attributes for;
- `features`, a set of features you are to predict attributes for;
- `bestie_dict`, a dictionary of sets of the besties for each user in the
  dataset, following the same format as the earlier questions in the project;
- `feat_dict`, a dictionary containing the known attributes for each user in the
  training data, across a range of features; note that there is no guarantee
  that the attribute for a given feature will be known for every training user.

Your function should return a dictionary of features (based on features), with
a predicted list of values for each.

Your function should make its predictions as follows:

- First, identify the set of besties for the given user, and for each feature
  of interest, determine the most-commonly attested attribute for that feature
  among the besties. In the case of a tie, the prediction should be a sorted
  list of attributes.
- Second, for any features where no bestie has an attribute for that feature
  (meaning no prediction was possible in the first step), repeat the process
  using the second-besties, once again in the form of a sorted list
  of attributes.
- In the case that no bestie or second-bestie has that attribute, return an
  empty list.

Note that all attributes will take the form of strings, with the empty string
representing the fact that the user explicitly has no value for that feature
(e.g., if the user did not go to university, the value for university would be
`''`), and the lack of an attribute for a given feature indicating that the
attribute is unknown. Note further that even if the attribute for `unknown_user`
is available in `feat_dict`, you should predict based on the attributes of
besties and second besties.

Example calls to the function are:

```python
>>> friendly_prediction('glenn', {'favourite author', 'university'}, {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}}, {'glenn': {'university': ''}, 'kim': {'favourite author': 'AA Milne'}, 'sandy': {'favourite author': 'JRR Tolkien', "university": "University of Melbourne"}, 'alex': {'favourite author': 'AA Milne', 'university': 'Monash University'}})

{'university': ['Monash University', 'University of Melbourne'], 'favourite author': ['AA Milne']}

>>> friendly_prediction('kim', {'university'}, {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}}, {'glenn': {'university': ''}, 'kim': {'favourite author': 'AA Milne'}, 'sandy': {'favourite author': 'JRR Tolkien', "university": "University of Melbourne"}, 'alex': {'favourite author': 'AA Milne', 'university': 'Monash University'}})

{'university': ['', 'Monash University', 'University of Melbourne']}

>>> friendly_prediction('kim', {'birthplace'}, {'kim': {'sandy', 'alex', 'glenn'}, 'sandy': {'kim', 'alex'}, 'alex': {'kim', 'sandy'}, 'glenn': {'kim'}}, {'glenn': {'university': ''}, 'kim': {'favourite author': 'AA Milne'}, 'sandy': {'favourite author': 'JRR Tolkien', "university": "University of Melbourne"}, 'alex': {'favourite author': 'AA Milne', 'university': 'Monash University'}})

{'birthplace': []}
```