import pprint
import requests
= "https://resultsdata.elections.qld.gov.au/SGE2024-electorates.json"
electorates_url = requests.get(electorates_url)
resp
= resp.json()["electorates"] electorates
I have recently written a dashboard to explore the preference flows of votes during the 2024 QLD state elections. The dashboard is available here.
The dashboard has three tabs: 1. Overall preference flows, 2. Preference flows in each electorate, 3. A what-if scenario to test the impact of different levels of preference flows from the Greens to Labor.
In this noitebook, I will discuss how I generated the overall preference flow Sankey diagram and a corresponding table.
Getting the data
The election data is available from the QLD Electoral commission.
The first step is to get data about all the electorates. This is available in the form of a JSON file from the QEC website.
0], depth=2) pprint.pp(electorates[
{'electorateId': 893,
'parentElectorateId': None,
'electorateName': 'Algester',
'stub': 'algester',
'contestType': 'State',
'electorateType': 'State',
'votingMethod': 'Attendance Ballot',
'votingSystem': 'Compulsory Preferential Voting',
'votingSystemMayor': None,
'votingSystemCouncillor': None,
'indicativeCount': True,
'candidatesCount': 6,
'candidates': [{...}, {...}, {...}, {...}, {...}, {...}],
'enrolment': 40236}
There’s lots of information about each electorate inlcuding the candidates and total enrolment. But we’re mostly interested in the electorate’s name and the code (stub) used by the QEC API.
= [x["stub"] for x in electorates] electorate_names
Preference count for each electorate
The first preference votes, preference flows and the final result are all available with a single API call.
https://resultsdata.elections.qld.gov.au/SGE2024-preference-count-district-{electorate_stub}.json
Let’s see what it looks like for the Aspley electorate.
from string import Template
= Template(
prefs_url "https://resultsdata.elections.qld.gov.au/SGE2024-preference-count-district-$electorate.json"
)= "aspley"
electorate = requests.get(prefs_url.substitute(electorate=electorate))
res = res.json()
result
=2) pprint.pp(result, depth
{'contest': 'state',
'electorateName': 'Aspley',
'countround': 4,
'lastUpdated': '2024-11-13T14:43:05.097649+10:00',
'totalFormalVotes': 35747,
'totalVotes': 35747,
'candidates': [{...}, {...}],
'totalPrimary': 35747,
'preferenceDistributionDetails': {'distributions': [...], 'primary': [...]},
'totalExhausted': 0,
'totalExhaustedPercentage': '0.00%',
'votesRemainingInCount': 35747,
'votesRemainingInCountPercentage': '100.00%'}
This json file has a lot of information. So let’s break this down into different chunks.
Primary votes
The primary votes obtained by each candidate are provided as preferenceDistributionDetails > primary
. This is a list containing information for each candidate.
"preferenceDistributionDetails"]["primary"] result[
[{'ballotName': 'HALL, Allan',
'ballotOrderNumber': 1,
'party': "Pauline Hanson's One Nation Queensland Division",
'partyCode': 'One Nation',
'primary': 1539},
{'ballotName': 'CAPELL, Wayne',
'ballotOrderNumber': 2,
'party': 'Family First Queensland',
'partyCode': 'Family First',
'primary': 707},
{'ballotName': 'MELLISH, Bart',
'ballotOrderNumber': 3,
'party': 'Australian Labor Party (State of Queensland)',
'partyCode': 'Australian Labor Party',
'primary': 13988},
{'ballotName': 'COOPER, Amanda',
'ballotOrderNumber': 4,
'party': 'Liberal National Party of Queensland',
'partyCode': 'LNP',
'primary': 15696},
{'ballotName': 'HAWKINS, Fiona',
'ballotOrderNumber': 5,
'party': 'Queensland Greens',
'partyCode': 'The Greens',
'primary': 3817}]
Final tally
The final tally is available as the candidates
key in the JSON result.
"candidates"] result[
[{'ballotName': 'MELLISH, Bart',
'ballotOrderNumber': 3,
'party': 'Australian Labor Party (State of Queensland)',
'partyCode': 'Australian Labor Party',
'count': 17889,
'percentage': '50.04%'},
{'ballotName': 'COOPER, Amanda',
'ballotOrderNumber': 4,
'party': 'Liberal National Party of Queensland',
'partyCode': 'LNP',
'count': 17858,
'percentage': '49.96%'}]
Preference distributions
This contains information about each preference distribution: the excluded candidate, preferences distributed to each remaining candidate, and the new tally of the remaining candidates.
"preferenceDistributionDetails"]["distributions"][0], depth=3) pprint.pp(result[
{'exclusion': 1,
'excludedCandidate': 'CAPELL, Wayne',
'excludedCandidateBallotOrder': 2,
'excludedCandidateParty': 'Family First Queensland',
'excludedCandidatePartyCode': 'Family First',
'excludedCandidateVotes': 707,
'exhausted': 0,
'exhaustedPercentage': '0.00%',
'totalExhausted': 0,
'votesDistributed': 707,
'votesRemainingInCount': 35747,
'preferences': [{'ballotName': 'HALL, Allan',
'ballotOrderNumber': 1,
'party': "Pauline Hanson's One Nation Queensland Division",
'partyCode': 'One Nation',
'preferences': 302,
'preferencesPercentage': '42.72%',
'runningTotal': 1841},
{'ballotName': 'MELLISH, Bart',
'ballotOrderNumber': 3,
'party': 'Australian Labor Party (State of Queensland)',
'partyCode': 'Australian Labor Party',
'preferences': 113,
'preferencesPercentage': '15.98%',
'runningTotal': 14101},
{'ballotName': 'COOPER, Amanda',
'ballotOrderNumber': 4,
'party': 'Liberal National Party of Queensland',
'partyCode': 'LNP',
'preferences': 179,
'preferencesPercentage': '25.32%',
'runningTotal': 15875},
{'ballotName': 'HAWKINS, Fiona',
'ballotOrderNumber': 5,
'party': 'Queensland Greens',
'partyCode': 'The Greens',
'preferences': 113,
'preferencesPercentage': '15.98%',
'runningTotal': 3930}]}
Downloading all data
Next, we want to dowload all the data and save that somewhere to avoid repeating the API calls.
Before doing that, lets also assign colours to the parties. These are based on the colours commonly associated with these parties. https://peo.gov.au/understand-our-parliament/your-questions-on-notice/questions/what-are-the-colours-of-the-australian-political-parties
I’ve assigned unique colours to other parties whose colours I’m not aware of. And we also need colours for independents.
= {
colours "ALP": "#FF0000",
"Family First": "#1CE6EF",
"KAP": "#8B0000",
"LCQP": "#788D66",
"LNP": "#0000FF",
"One Nation": "#FFA500",
"The Greens": "#00FF00",
"Animal Justice Party": "#885578",
"Libertarians": "#FF34FF",
}
# For independent candidates
= [
other_colours "#000000",
"#FFFF00",
"#FAD09F",
"#FF8A9A",
"#D157A0",
"#FF4A46",
]
Based on the structure of the results we saw above and also the requirements for this project, we will create three dataframes focusing on the first preferences, final result, and the distributions.
We’ll use pandas
, particularly the json_normalize
and the DataFrame.from_records
- based on the JSON structure provided by the API.
import pandas as pd
Let’s specify what keys we are interested in.
= [
exclusion_cols "exclusion",
"excludedCandidate",
"excludedCandidatePartyCode",
"excludedCandidateBallotOrder",
"votesDistributed",
]= [
pref_cols "ballotName",
"preferences",
"preferencePercentage",
"runningTotal",
"partyCode",
"ballotOrderNumber",
]
We’ll define a couple of methods to parse the data from the JSON.
The first method we’ll define is to parse the votes received by candidates in either the primary count or the final count. We have already seen the strucutre of this data above. One key difference between the primary and final counts is the keys used to denote the votes: primary and count respectively. We’ll get around this by rename the primary column to count (if it exists) in the dataframe.
def parse_candidate_votes(data):
"""Parse the votes received by each candidate in the `data`."""
= pd.DataFrame.from_records(data, exclude=["party"]).rename(
counts ={
columns"ballotName": "candidate",
"ballotOrderNumber": "ballotOrder",
"partyCode": "party",
"primary": "count", # rename 'primary' to 'count' if it exists
}
)# Now we have a 'count' column irrespective of the data
# assign colours to the candidates based on their party
"colour"] = counts["party"].map(colours)
counts[
# independents without a party colour
= counts["colour"].isna()
no_colour "colour"]] = other_colours[: no_colour.sum()]
counts.loc[no_colour, [return counts
The next method is to parse the preference distributions in each distribution round. This will take a list of distributions and return a dataframe.
def parse_distribution(data):
= pd.json_normalize(data, "preferences", exclusion_cols)
distribution
# remove redundant columns
= distribution.loc[
distribution + pref_cols)
:, distribution.columns.isin(exclusion_cols
]
# rename columns
= distribution.rename(
distribution ={
columns"ballotName": "toCandidate",
"ballotOrderNumber": "toBallotOrder",
"partyCode": "toParty",
"runningTotal": "toRunningTotal",
"excludedCandidate": "fromCandidate",
"excludedCandidatePartyCode": "fromParty",
"excludedCandidateBallotOrder": "fromBallotOrder",
}
)return distribution
Now, we’re ready to download the data.
= [], [], []
first_prefs, distributions, final_tallies for electorate in [x["stub"] for x in electorates]:
= requests.get(prefs_url.substitute(electorate=electorate))
res = res.json()
result
# first preferences
= parse_candidate_votes(
first_pref "preferenceDistributionDetails"]["primary"]
result[
)"electorate"] = electorate
first_pref[= first_pref.sort_values("count")
first_pref
first_prefs.append(first_pref)
# final tally
= parse_candidate_votes(result["candidates"])
final "electorate"] = electorate
final[= final.sort_values("count")
final
final_tallies.append(final)
# preference distribution
= result["preferenceDistributionDetails"]["distributions"]
data = parse_distribution(data)
distribution "electorate"] = electorate # add electorate information
distribution[ distributions.append(distribution)
Tidy
Let’s tidy the data.
- Concatenate data for each electorate into a single dataframe
- Set appropriate data types
- Rename ‘Australian Labor Party’ to ‘ALP’
- Set ‘IND’ as the party value for independents - it is currently an empty string
# we'll concatenate all electorate data into one dataframe for each data type
= pd.concat(distributions).set_index("electorate")
distribution = pd.concat(first_prefs).set_index("electorate")
first_pref = pd.concat(final_tallies)
final_tally
# set appropriate data types for each dataframe
= distribution.astype(
distribution
{"toBallotOrder": "uint8",
"preferences": "uint16",
"toRunningTotal": "uint16",
"exclusion": "uint8",
"fromBallotOrder": "uint8",
"votesDistributed": "uint16",
"toParty": "category",
"fromParty": "category",
}
)
= first_pref.astype(
first_pref
{"ballotOrder": "uint8",
"count": "uint16",
"party": "category",
"colour": "category",
}
)
= final_tally.astype(
final_tally
{"ballotOrder": "uint8",
"party": "category",
"count": "uint16",
"colour": "category",
}
)
# rename Australian Labor Party to ALP
"toParty"] = distribution["toParty"].cat.rename_categories(
distribution["Australian Labor Party": "ALP"}
{
)"fromParty"] = distribution["fromParty"].cat.rename_categories(
distribution["Australian Labor Party": "ALP"}
{
)"party"] = final_tally["party"].cat.rename_categories(
final_tally["Australian Labor Party": "ALP"}
{
)"party"] = first_pref["party"].cat.rename_categories(
first_pref["Australian Labor Party": "ALP"}
{
)
# Change independent candidates' party from <empty-string> to "Ind."
= "Ind."
IND
"toParty"] = distribution["toParty"].cat.rename_categories({"": IND})
distribution["fromParty"] = distribution["fromParty"].cat.rename_categories({"": IND})
distribution[
"party"] = final_tally["party"].cat.rename_categories({"": IND})
final_tally["party"] = first_pref["party"].cat.rename_categories({"": IND}) first_pref[
Tabular Visualisation
First, let’s generate a pretty table using pandas. Note: we need matplotlib
to use the background_gradient syle of pandas.
Some quick observations: - The most votes were transferred from the Greens and One Nation, and the LNP had one of the fewest transfers because they were almost always either 1st or 2nd.
Let’s look at the overall preference flows for the entire state. But first, I’ll ignore the independents (those with an empty party name).
= distribution[
distribution_party "fromParty"] != IND) & (distribution["toParty"] != IND)
(distribution[
]
distribution_party.pivot_table(=["fromParty"],
index=["toParty"],
columns="preferences",
values="sum",
aggfunc=True,
margins="Total",
margins_name=True,
observed=0,
fill_value=1, cmap="BuPu") ).style.background_gradient(axis
toParty | Animal Justice Party | ALP | Family First | KAP | LCQP | LNP | Libertarians | One Nation | The Greens | Total |
---|---|---|---|---|---|---|---|---|---|---|
fromParty | ||||||||||
Animal Justice Party | 0 | 1654 | 592 | 0 | 999 | 1328 | 35 | 1383 | 3954 | 9945 |
ALP | 0 | 0 | 0 | 15872 | 0 | 16167 | 0 | 3108 | 6912 | 42059 |
Family First | 393 | 9923 | 0 | 669 | 3197 | 14244 | 53 | 21496 | 9489 | 59464 |
KAP | 0 | 12871 | 0 | 0 | 0 | 25268 | 0 | 1475 | 229 | 39843 |
LCQP | 0 | 19909 | 607 | 1317 | 0 | 12438 | 114 | 15557 | 10781 | 60723 |
LNP | 0 | 8239 | 0 | 0 | 0 | 0 | 0 | 0 | 3030 | 11269 |
Libertarians | 0 | 954 | 92 | 0 | 64 | 2176 | 0 | 606 | 1044 | 4936 |
One Nation | 0 | 71986 | 0 | 13086 | 664 | 190359 | 593 | 0 | 23888 | 300576 |
The Greens | 0 | 242729 | 680 | 2505 | 5846 | 67648 | 0 | 14919 | 0 | 334327 |
Total | 393 | 368265 | 1971 | 33449 | 10770 | 329628 | 795 | 58544 | 59327 | 863142 |
Sankey diagram of preference flows
import holoviews as hv
from holoviews import dim
"bokeh") hv.extension(
The first step to generate a saneky diagram of preference flows is to get total flows from each party to every other party. For this, we’ll group the distribution
dataframe by fromParty
and toParty
columns and sum all the preference flows for every such pair.
= distribution_party.groupby(["fromParty", "toParty"], observed=True)[
total_dist_flows "preferences"]
[sum()
].10) total_dist_flows.head(
preferences | ||
---|---|---|
fromParty | toParty | |
Animal Justice Party | ALP | 1654 |
Family First | 592 | |
LCQP | 999 | |
LNP | 1328 | |
Libertarians | 35 | |
One Nation | 1383 | |
The Greens | 3954 | |
ALP | KAP | 15872 |
LNP | 16167 | |
One Nation | 3108 |
We can now calculate what percent of pref flows from a party to every other party out of the total preference flows from that party. We’ll do this for both incoming and outgoing flows. For example, what percentage of One Nation’s total preference distributions went to LNP, and what percent of LNP’s incoming preferences came from One Nation.
"from_pct"] = (
total_dist_flows["preferences"] # each party-to-party distribution
total_dist_flows[# divided by the total distributions from the 'fromParty'
/ total_dist_flows.groupby(["fromParty"], observed=True)["preferences"].transform(
"sum"
)* 100 # as a percentage
)
"to_pct"] = (
total_dist_flows["preferences"] # each party-to-party distribution
total_dist_flows[# divided by the total distributions to the 'toParty'
/ total_dist_flows.groupby(["toParty"], observed=True)["preferences"].transform(
"sum"
)* 100 # as a percentage )
We’ll use the Sankey diagrams from Holoviews. holoviws.Sankey
requires as input a list of tuples of the form (from, to, value). Here’s a slightly modified example from the official documentation:
= ["PhD", "Career Outside Science", "Early Career Researcher", "Research Staff",
nodes "Permanent Research Staff", "Professor", "Non-Academic Research"]
= hv.Dataset(enumerate(nodes), 'index', 'label')
nodes = [
edges 0, 1, 53), (0, 2, 47), (2, 6, 17), (2, 3, 30), (3, 1, 22.5), (3, 4, 3.5), (3, 6, 4.), (4, 5, 0.45)
(
]
= hv.Dimension('Percentage', unit='%')
value_dim = hv.Sankey((edges, nodes), ['From', 'To'], vdims=value_dim) careers
In our case, we first need to assign unique codes to the ‘from’ and ‘to’ parties. We want to display each party in the ‘from’ and ‘to’ sides so that we can see both outgoing and incoming preference flows for each party. But Holoviews doesn’t support acyclic connections.
As a workaround, we’ll asign two unique codes for each party, for the ‘to’ and ‘from’ sections. Let’s start with the outgoing (fromParty
) flows. For this, we’ll use the codes
attribute of pandas.Categorical
. The fromParty
and toParty
are already categoricals in our dataframe.
# codes for 'from' parties
"from"] = total_dist_flows.index.get_level_values("fromParty").codes
total_dist_flows[
# unique codes for 'to' parties, starting after the max 'from' code
= total_dist_flows["from"].max()
step "to"] = total_dist_flows.index.get_level_values("toParty").codes + step total_dist_flows[
Now, we’ll generate the nodes dataset as in the above example.
= total_dist_flows.reset_index()
total_dist_flows
= sorted(
from_codes dict(zip(total_dist_flows["from"], total_dist_flows["fromParty"])).items()
)= sorted(
to_codes dict(zip(total_dist_flows["to"], total_dist_flows["toParty"])).items()
)
# merge the from/to_codes to get the dataset
= hv.Dataset(from_codes + to_codes, "index", "party") nodes
Now lets draw the first version of our sankey diagram. Note here that kdims refers to the key variables used by holoviews to index our data. And vdims are the variables that are shown in the result visualisation.
hv.Sankey(
(total_dist_flows, nodes),=["from", "to"],
kdims=["preferences", "from_pct", "to_pct", "fromParty", "toParty"],
vdims=600, toolbar="above") ).opts(width
There are many ways in which we can improve the above plot. 1. The same parties in the from/to sides don’t have the same colours. We need to fix this. 2. We want to show party names instead of codes in the labels. 3. We want to make the hover tooltips better formatted.
We’ll do this below.
Let’s start with the colormap. We have already defined colours for each party in the beginning. Let’s used those colours and create our own colourmap.
= {str(i[0]): colours[i[1]] for i in from_codes + to_codes} cmap
When using the bokeh backend, Holoviews provides HoverTool to fine tune the tooltips. Let’s make use of that.
from bokeh.models import HoverTool
# we'll include from/from_pct and to/to_pct in the same line.
= HoverTool(
hover =[
tooltips"From", "@fromParty (@from_pct %)"),
("To", "@toParty (@to_pct %)"),
("Transferred", "@preferences{0,0}"),
(
], )
Putting everything together, we have a much more visually appealing and informative Sankey diagram.
hv.Sankey(
(total_dist_flows, nodes),=["from", "to"],
kdims=["preferences", "from_pct", "to_pct", "fromParty", "toParty"],
vdims
).opts(="party",
labels="outer",
label_position=dim("to").str(),
edge_color=dim("index").str(),
node_color=cmap,
cmap=[hover],
tools="above",
toolbar=600,
width )
That’s the Saneky diagram from the dashboard. The source code for this and the rest of the dashboard is available in github here.