import requests
import json
I have recently written a dashboard to explore the preference flows of votes during the 2024 QLD state elections. The dashboard is available here.
The dashboard has three tabs: 1. Overall preference flows, 2. Preference flows in each electorate, 3. A what-if scenario to test the impact of different levels of preference flows from the Greens to Labor.
In this first part of a four part series of posts, I will discuss how I generated the overall preference flow Sankey diagram and a corresponding table.
Getting the data
The election data is available from the QLD Electoral commission.
The first step is to get data about all the electorates. This is available in the form of a JSON file from the QEC website.
= (
electorates_url "https://resultsdata.elections.qld.gov.au/SGE2024-electorates.json"
)= requests.get(electorates_url)
resp
= resp.json()["electorates"] electorates
0] electorates[
{'electorateId': 893,
'parentElectorateId': None,
'electorateName': 'Algester',
'stub': 'algester',
'contestType': 'State',
'electorateType': 'State',
'votingMethod': 'Attendance Ballot',
'votingSystem': 'Compulsory Preferential Voting',
'votingSystemMayor': None,
'votingSystemCouncillor': None,
'indicativeCount': True,
'candidatesCount': 6,
'candidates': [{'ballotName': 'BOSLEY, Rhys',
'ballotOrderNumber': 1,
'candidateSurname': 'BOSLEY',
'candidateGivenNames': 'RHYS ANTHONY',
'party': 'Independent',
'partyCode': 'IND'},
{'ballotName': 'TURNER, Jane',
'ballotOrderNumber': 2,
'candidateSurname': 'TURNER',
'candidateGivenNames': 'JANE LOUISE',
'party': 'Family First Queensland',
'partyCode': 'Family First'},
{'ballotName': 'PRASAD, Jitendra',
'ballotOrderNumber': 3,
'candidateSurname': 'PRASAD',
'candidateGivenNames': 'JITENDRA PRABHAKAR',
'party': 'Liberal National Party of Queensland',
'partyCode': 'LNP'},
{'ballotName': 'ENOCH, Leeanne',
'ballotOrderNumber': 4,
'candidateSurname': 'ENOCH',
'candidateGivenNames': 'LEEANNE MARGARET',
'party': 'Australian Labor Party (State of Queensland)',
'partyCode': 'Australian Labor Party'},
{'ballotName': 'MARIS, George',
'ballotOrderNumber': 5,
'candidateSurname': 'MARIS',
'candidateGivenNames': 'GEORGE',
'party': "Pauline Hanson's One Nation Queensland Division",
'partyCode': 'One Nation'},
{'ballotName': 'WILDIN, Andrea',
'ballotOrderNumber': 6,
'candidateSurname': 'WILDIN',
'candidateGivenNames': 'ANDREA SUSAN',
'party': 'Queensland Greens',
'partyCode': 'The Greens'}],
'enrolment': 40236}
There’s lots of information about each electorate inlcuding the candidates and total enrolment. But we’re mostly interested in the electorate’s name and the code (stub) used by the QEC API.
= [x["stub"] for x in electorates] electorate_names
Preference count for each electorate
The first preference votes, preference flows and the final result are all available with a single API call. > https://resultsdata.elections.qld.gov.au/SGE2024-preference-count-district-{electorate_stub}.json
Let’s see what it looks like for the Aspley electorate.
from string import Template
= Template(
prefs_url "https://resultsdata.elections.qld.gov.au/SGE2024-preference-count-district-$electorate.json"
)= "aspley"
electorate = requests.get(prefs_url.substitute(electorate=electorate))
res = res.json() result
This json file has a lot of information. So let’s break this down into different chunks.
Primary votes
# primary votes
"preferenceDistributionDetails"]["primary"] result[
[{'ballotName': 'HALL, Allan',
'ballotOrderNumber': 1,
'party': "Pauline Hanson's One Nation Queensland Division",
'partyCode': 'One Nation',
'primary': 1539},
{'ballotName': 'CAPELL, Wayne',
'ballotOrderNumber': 2,
'party': 'Family First Queensland',
'partyCode': 'Family First',
'primary': 707},
{'ballotName': 'MELLISH, Bart',
'ballotOrderNumber': 3,
'party': 'Australian Labor Party (State of Queensland)',
'partyCode': 'Australian Labor Party',
'primary': 13988},
{'ballotName': 'COOPER, Amanda',
'ballotOrderNumber': 4,
'party': 'Liberal National Party of Queensland',
'partyCode': 'LNP',
'primary': 15696},
{'ballotName': 'HAWKINS, Fiona',
'ballotOrderNumber': 5,
'party': 'Queensland Greens',
'partyCode': 'The Greens',
'primary': 3817}]
Final tally
This gives us the final two party/candidate tally.
"candidates"] result[
[{'ballotName': 'MELLISH, Bart',
'ballotOrderNumber': 3,
'party': 'Australian Labor Party (State of Queensland)',
'partyCode': 'Australian Labor Party',
'count': 17889,
'percentage': '50.04%'},
{'ballotName': 'COOPER, Amanda',
'ballotOrderNumber': 4,
'party': 'Liberal National Party of Queensland',
'partyCode': 'LNP',
'count': 17858,
'percentage': '49.96%'}]
Preference distributions
This contains information about each preference distribution: the excluded candidate, preferences distributed to each remaining candidate, and the new tally of the remaining candidates.
"preferenceDistributionDetails"]["distributions"][0] result[
{'exclusion': 1,
'excludedCandidate': 'CAPELL, Wayne',
'excludedCandidateBallotOrder': 2,
'excludedCandidateParty': 'Family First Queensland',
'excludedCandidatePartyCode': 'Family First',
'excludedCandidateVotes': 707,
'exhausted': 0,
'exhaustedPercentage': '0.00%',
'totalExhausted': 0,
'votesDistributed': 707,
'votesRemainingInCount': 35747,
'preferences': [{'ballotName': 'HALL, Allan',
'ballotOrderNumber': 1,
'party': "Pauline Hanson's One Nation Queensland Division",
'partyCode': 'One Nation',
'preferences': 302,
'preferencesPercentage': '42.72%',
'runningTotal': 1841},
{'ballotName': 'MELLISH, Bart',
'ballotOrderNumber': 3,
'party': 'Australian Labor Party (State of Queensland)',
'partyCode': 'Australian Labor Party',
'preferences': 113,
'preferencesPercentage': '15.98%',
'runningTotal': 14101},
{'ballotName': 'COOPER, Amanda',
'ballotOrderNumber': 4,
'party': 'Liberal National Party of Queensland',
'partyCode': 'LNP',
'preferences': 179,
'preferencesPercentage': '25.32%',
'runningTotal': 15875},
{'ballotName': 'HAWKINS, Fiona',
'ballotOrderNumber': 5,
'party': 'Queensland Greens',
'partyCode': 'The Greens',
'preferences': 113,
'preferencesPercentage': '15.98%',
'runningTotal': 3930}]}
Downloading all data
Next, we want to dowload all the data and save that somewhere to avoid repeating the API calls.
Before doing that, lets also assign colours to the parties. These are based on the colours commonly associated with these parties. https://peo.gov.au/understand-our-parliament/your-questions-on-notice/questions/what-are-the-colours-of-the-australian-political-parties
I’ve assigned unique colours to other parties whose colours I’m not aware of. And we also need colours for independents.
= {
colours "ALP": "#FF0000",
"Family First": "#1CE6EF",
"KAP": "#8B0000",
"LCQP": "#788D66",
"LNP": "#0000FF",
"One Nation": "#FFA500",
"The Greens": "#00FF00",
"Animal Justice Party": "#885578",
"Libertarians": "#FF34FF",
}
# For independent candidates
= [
other_colours "#000000",
"#FFFF00",
"#FAD09F",
"#FF8A9A",
"#D157A0",
"#FF4A46",
]
Based on the structure of the results we saw above and also the requirements for this project, we will create three dataframes focusing on the first preferences, final result, and the distributions.
We’ll use pandas
, particularly the json_normalize
and the DataFrame.from_records
- based on the JSON structure provided by the API.
import pandas as pd
Let’s specify what keys we are interested in.
= [
exclusion_cols "exclusion",
"excludedCandidate",
"excludedCandidatePartyCode",
"excludedCandidateBallotOrder",
"votesDistributed",
]= [
pref_cols "ballotName",
"preferences",
"preferencePercentage",
"runningTotal",
"partyCode",
"ballotOrderNumber",
]
We’ll define a couple of methods to parse the data from the JSON.
The first method we’ll define is to parse the votes received by candidates in either the primary count or the final count. We have already seen the strucutre of this data above. One key difference between the primary and final counts is the keys used to denote the votes: primary and count respectively. We’ll get around this by rename the primary column to count (if it exists) in the dataframe.
def parse_candidate_votes(data):
"""Parse the votes received by each candidate in the `data`."""
= pd.DataFrame.from_records(data, exclude=["party"]).rename(
counts ={
columns"ballotName": "candidate",
"ballotOrderNumber": "ballotOrder",
"partyCode": "party",
"primary": "count", # rename 'primary' to 'count' if it exists
}
)# Now we have a 'count' column irrespective of the data
# assign colours to the candidates based on their party
"colour"] = counts["party"].map(colours)
counts[
# independents without a party colour
= counts["colour"].isna()
no_colour "colour"]] = other_colours[: no_colour.sum()]
counts.loc[no_colour, [return counts
The next method is to parse the preference distributions in each distribution round. This will take a list of distributions and return a dataframe.
def parse_distribution(data):
= pd.json_normalize(data, "preferences", exclusion_cols)
distribution
# remove redundant columns
= distribution.loc[
distribution + pref_cols)
:, distribution.columns.isin(exclusion_cols
]
# rename columns
= distribution.rename(
distribution ={
columns"ballotName": "toCandidate",
"ballotOrderNumber": "toBallotOrder",
"partyCode": "toParty",
"runningTotal": "toRunningTotal",
"excludedCandidate": "fromCandidate",
"excludedCandidatePartyCode": "fromParty",
"excludedCandidateBallotOrder": "fromBallotOrder",
}
)return distribution
Now, we’re ready to download the data.
= [], [], []
first_prefs, distributions, final_tallies for electorate in [x["stub"] for x in electorates]:
= requests.get(prefs_url.substitute(electorate=electorate))
res = res.json()
result
# first preferences
= parse_candidate_votes(
first_pref "preferenceDistributionDetails"]["primary"]
result[
)"electorate"] = electorate
first_pref[= first_pref.sort_values("count")
first_pref
first_prefs.append(first_pref)
# final tally
= parse_candidate_votes(result["candidates"])
final "electorate"] = electorate
final[= final.sort_values("count")
final
final_tallies.append(final)
# preference distribution
= result["preferenceDistributionDetails"]["distributions"]
data = parse_distribution(data)
distribution "electorate"] = electorate # add electorate information
distribution[ distributions.append(distribution)
Tidy and save
Now that we’ve got the data, we can tidy it up and save for later use.
# we'll concatenate all electorate data into one dataframe for each data type
= pd.concat(distributions).set_index("electorate")
distribution = pd.concat(first_prefs).set_index("electorate")
first_pref = pd.concat(final_tallies)
final_tally
# set appropriate data types for each dataframe
= distribution.astype(
distribution
{"toBallotOrder": "uint8",
"preferences": "uint16",
"toRunningTotal": "uint16",
"exclusion": "uint8",
"fromBallotOrder": "uint8",
"votesDistributed": "uint16",
"toParty": "category",
"fromParty": "category",
}
)
= first_pref.astype(
first_pref
{"ballotOrder": "uint8",
"count": "uint16",
"party": "category",
"colour": "category",
}
)
= final_tally.astype(
final_tally
{"ballotOrder": "uint8",
"party": "category",
"count": "uint16",
"colour": "category",
} )
For making our life a bit easier later, let’s rename ‘Australian Labor Party’ to ‘ALP’.
"toParty"] = distribution["toParty"].cat.rename_categories(
distribution["Australian Labor Party": "ALP"}
{
)"fromParty"] = distribution["fromParty"].cat.rename_categories(
distribution["Australian Labor Party": "ALP"}
{
)"party"] = final_tally["party"].cat.rename_categories(
final_tally["Australian Labor Party": "ALP"}
{
)"party"] = first_pref["party"].cat.rename_categories(
first_pref["Australian Labor Party": "ALP"}
{ )
One last thing: we’ll set ‘IND’ as the party value for independents - it is currently ’’.
= "Ind." IND
"toParty"] = distribution["toParty"].cat.rename_categories(
distribution["": IND}
{
)"fromParty"] = distribution["fromParty"].cat.rename_categories(
distribution["": IND}
{
)
"party"] = final_tally["party"].cat.rename_categories({"": IND})
final_tally["party"] = first_pref["party"].cat.rename_categories({"": IND}) first_pref[
Now we can save the data to parquet files.
distribution.to_parquet("data/qld_2024_distributions.parq", compression="brotli"
)"data/qld_2024_first_prefs.parq", compression="brotli")
first_pref.to_parquet("data/qld_2024_final_tally.parq", compression="brotli") final_tally.to_parquet(
= pd.read_parquet("data/qld_2024_distributions.parq")
distribution = pd.read_parquet("data/qld_2024_first_prefs.parq")
first_pref = pd.read_parquet("data/qld_2024_final_tally.parq") final_tally
Tabular Visualisation
First, let’s generate a pretty table using pandas. Note: we need matplotlib
to use the background_gradient syle of pandas.
Some quick observations: - The most votes were transferred from the Greens and One Nation, and the LNP had one of the fewest transfers because they were almost always either 1st or 2nd.
Let’s look at the overall preference flows for the entire state. But first, I’ll ignore the independents (those with an empty party name).
= distribution[
distribution_party "fromParty"] != IND) & (distribution["toParty"] != IND)
(distribution[ ]
distribution_party.pivot_table(=["fromParty"],
index=["toParty"],
columns="preferences",
values="sum",
aggfunc=True,
margins="Total",
margins_name=True,
observed=0,
fill_value=1, cmap="BuPu") ).style.background_gradient(axis
toParty | Animal Justice Party | ALP | Family First | KAP | LCQP | LNP | Libertarians | One Nation | The Greens | Total |
---|---|---|---|---|---|---|---|---|---|---|
fromParty | ||||||||||
Animal Justice Party | 0 | 1654 | 592 | 0 | 999 | 1328 | 35 | 1383 | 3954 | 9945 |
ALP | 0 | 0 | 0 | 15872 | 0 | 16167 | 0 | 3108 | 6912 | 42059 |
Family First | 393 | 9923 | 0 | 669 | 3197 | 14244 | 53 | 21496 | 9489 | 59464 |
KAP | 0 | 12871 | 0 | 0 | 0 | 25268 | 0 | 1475 | 229 | 39843 |
LCQP | 0 | 19909 | 607 | 1317 | 0 | 12438 | 114 | 15557 | 10781 | 60723 |
LNP | 0 | 8239 | 0 | 0 | 0 | 0 | 0 | 0 | 3030 | 11269 |
Libertarians | 0 | 954 | 92 | 0 | 64 | 2176 | 0 | 606 | 1044 | 4936 |
One Nation | 0 | 71986 | 0 | 13086 | 664 | 190359 | 593 | 0 | 23888 | 300576 |
The Greens | 0 | 242729 | 680 | 2505 | 5846 | 67648 | 0 | 14919 | 0 | 334327 |
Total | 393 | 368265 | 1971 | 33449 | 10770 | 329628 | 795 | 58544 | 59327 | 863142 |
Sankey diagram of preference flows
import holoviews as hv
from holoviews import dim, opts
"bokeh") hv.extension(
The first step to generate a saneky diagram of preference flows is to get total flows from each party to every other party. For this, we’ll group the distribution
dataframe by fromParty
and toParty
columns and sum all the preference flows for every such pair.
= distribution_party.groupby(
total_dist_flows "fromParty", "toParty"], observed=True
["preferences"]].sum()
)[[10) total_dist_flows.head(
preferences | ||
---|---|---|
fromParty | toParty | |
Animal Justice Party | ALP | 1654 |
Family First | 592 | |
LCQP | 999 | |
LNP | 1328 | |
Libertarians | 35 | |
One Nation | 1383 | |
The Greens | 3954 | |
ALP | KAP | 15872 |
LNP | 16167 | |
One Nation | 3108 |
We can now calculate what percent of pref flows from a party to every other party out of the total preference flows from that party. We’ll do this for both incoming and outgoing flows. For example, what percentage of One Nation’s total preference distributions went to LNP, and what percent of LNP’s incoming preferences came from One Nation.
"from_pct"] = (
total_dist_flows["preferences"] # each party-to-party distribution
total_dist_flows[# divided by the total distributions from the 'fromParty'
/ total_dist_flows.groupby(["fromParty"], observed=True)[
"preferences"
"sum")
].transform(* 100 # as a percentage )
"to_pct"] = (
total_dist_flows["preferences"] # each party-to-party distribution
total_dist_flows[# divided by the total distributions to the 'toParty'
/ total_dist_flows.groupby(["toParty"], observed=True)[
"preferences"
"sum")
].transform(* 100 # as a percentage )
We’ll use the Sankey diagrams from Holoviews. holoviws.Sankey
requires as input a list of tuples of the form (from, to, value). Here’s a slightly modified example from the official documentation:
= ["PhD", "Career Outside Science", "Early Career Researcher", "Research Staff",
nodes "Permanent Research Staff", "Professor", "Non-Academic Research"]
= hv.Dataset(enumerate(nodes), 'index', 'label')
nodes = [
edges 0, 1, 53), (0, 2, 47), (2, 6, 17), (2, 3, 30), (3, 1, 22.5), (3, 4, 3.5), (3, 6, 4.), (4, 5, 0.45)
(
]
= hv.Dimension('Percentage', unit='%')
value_dim = hv.Sankey((edges, nodes), ['From', 'To'], vdims=value_dim) careers
In our case, we first need to assign unique codes to the ‘from’ and ‘to’ parties. We want to display each party in the ‘from’ and ‘to’ sides so that we can see both outgoing and incoming preference flows for each party. But Holoviews doesn’t support acyclic connections.
As a workaround, we’ll asign two unique codes for each party, for the ‘to’ and ‘from’ sections. Let’s start with the outgoing (fromParty
) flows. For this, we’ll use the codes
attribute of pandas.Categorical
. The fromParty
and toParty
are already categoricals in our dataframe.
# codes for 'from' parties
"from"] = total_dist_flows.index.get_level_values(
total_dist_flows["fromParty"
).codes
# unique codes for 'to' parties, starting after the max 'from' code
= total_dist_flows["from"].max()
step "to"] = (
total_dist_flows["toParty").codes + step
total_dist_flows.index.get_level_values( )
Now, we’ll generate the nodes dataset as in the above example.
= total_dist_flows.reset_index()
total_dist_flows
= sorted(
from_codes dict(zip(total_dist_flows["from"], total_dist_flows["fromParty"])).items()
)= sorted(
to_codes dict(zip(total_dist_flows["to"], total_dist_flows["toParty"])).items()
)
# merge the from/to_codes to get the dataset
= hv.Dataset(from_codes + to_codes, "index", "party") nodes
Now lets draw the first version of our sankey diagram. Note here that kdims refers to the key variables used by holoviews to index our data. And vdims are the variables that are shown in the result visualisation.
hv.Sankey(
(total_dist_flows, nodes),=["from", "to"],
kdims=["preferences", "from_pct", "to_pct", "fromParty", "toParty"],
vdims
).opts(=800,
width="above"
toolbar )
There are many ways in which we can improve the above plot. 1. The same parties in the from/to sides don’t have the same colours. We need to fix this. 2. We want to show party names instead of codes in the labels. 3. We want to make the hover tooltips better formatted.
We’ll do this below.
Let’s start with the colormap. We have already defined colours for each party in the beginning. Let’s used those colours and create our own colourmap.
= {str(i[0]): colours[i[1]] for i in from_codes + to_codes} cmap
When using the bokeh backend, Holoviews provides HoverTool to fine tune the tooltips. Let’s make use of that.
from bokeh.models import HoverTool
# we'll include from/from_pct and to/to_pct in the same line.
= HoverTool(
hover =[
tooltips"From", "@fromParty (@from_pct %)"),
("To", "@toParty (@to_pct %)"),
("Transferred", "@preferences{0,0}"),
(
], )
Putting everything together, we have a much more visually appealing and informative Sankey diagram.
hv.Sankey(
(total_dist_flows, nodes),=["from", "to"],
kdims=["preferences", "from_pct", "to_pct", "fromParty", "toParty"],
vdims
).opts(="party",
labels="outer",
label_position=dim("to").str(),
edge_color=dim("index").str(),
node_color=cmap,
cmap=[hover],
tools="above",
toolbar=800
width )
That’s the Saneky diagram from the dashboard. The source code for this and the rest of the dashboard is available in github here.
In the next posts in this series, I will explain how I created the other tabs in the dashboard.