The Token Wars: Why not everything should be open
Panorama Rooms | Tue 21 Jan 10:45 a.m.–11:30 a.m.
Presented by
-
Kathy Reid
@KathyReid@aus.social
https://kathyreid.com.au
Kathy Reid works at the intersection of open source, emerging technologies and technical communities.
Over the last 20 years, she has held several technical leadership positions, including roles as Digital Platforms and Operations Manager at Deakin University, managing platforms such as WordPress, Drupal, Squiz Matrix and Atlassian Confluence, technical lead on projects involving digital signage and videoconferencing, and has worked as a web and application developer.
More recently, she has run her own technical consulting micro-business, and been engaged on a variety of projects involving data visualisation, certification applications and emerging technologies workshops.
She was previously Director of Developer Relations at Mycroft.AI, an open source voice assistant startup, and President of Linux Australia, Inc, a not for profit organisation which advocates for the use of open source technologies and runs technical events such as Linux Conference Australia. She brought GovHack – the open data hackathon – to Geelong in 2015 and 2016 and in 2011 ran Geelong’s first unconference – BarCampGeelong. Most recently, she worked as a voice open source specialist for Mozilla.
Kathy holds Arts and Science undergraduate degrees from Deakin University and an MBA (Computing) from Charles Sturt University, a Master in Applied Cybernetics (MAppCyber) from Australian National University, as well as several ITIL qualifications.
In 2019, she was one of 16 people from across the world chosen to undertake a Masters Program in a brand new branch of engineering at the Australian National University's 3A Institute, where she is now a PhD candidate researching voice data and ways to prevent and respond to bias in machine learning systems that use voice and speech, like speech recognition.
Kathy recently completed a Research Partnership with Mozilla's Common Voice team, where she used Mozilla Common Voice data to assess the performance of the Whisper speech recognition on accented English, showing it was much less accurate for many spoken accents.

Kathy Reid
@KathyReid@aus.social
https://kathyreid.com.au
Kathy Reid works at the intersection of open source, emerging technologies and technical communities.
Over the last 20 years, she has held several technical leadership positions, including roles as Digital Platforms and Operations Manager at Deakin University, managing platforms such as WordPress, Drupal, Squiz Matrix and Atlassian Confluence, technical lead on projects involving digital signage and videoconferencing, and has worked as a web and application developer.
More recently, she has run her own technical consulting micro-business, and been engaged on a variety of projects involving data visualisation, certification applications and emerging technologies workshops.
She was previously Director of Developer Relations at Mycroft.AI, an open source voice assistant startup, and President of Linux Australia, Inc, a not for profit organisation which advocates for the use of open source technologies and runs technical events such as Linux Conference Australia. She brought GovHack – the open data hackathon – to Geelong in 2015 and 2016 and in 2011 ran Geelong’s first unconference – BarCampGeelong. Most recently, she worked as a voice open source specialist for Mozilla.
Kathy holds Arts and Science undergraduate degrees from Deakin University and an MBA (Computing) from Charles Sturt University, a Master in Applied Cybernetics (MAppCyber) from Australian National University, as well as several ITIL qualifications.
In 2019, she was one of 16 people from across the world chosen to undertake a Masters Program in a brand new branch of engineering at the Australian National University's 3A Institute, where she is now a PhD candidate researching voice data and ways to prevent and respond to bias in machine learning systems that use voice and speech, like speech recognition.
Kathy recently completed a Research Partnership with Mozilla's Common Voice team, where she used Mozilla Common Voice data to assess the performance of the Whisper speech recognition on accented English, showing it was much less accurate for many spoken accents.
Abstract
In recent years, there has been an explosion in generative AI. Most of us are now familiar with tools like ChatGPT, Midjourney, Sora, and others. At the heart of generative AI is a machine learning architecture called the "transformer", which is fed by huge datasets - text, images and videos. Those datasets are "tokenised" - cut up into chunks which the transformer can ingest. Those actors who can obtain the most tokens can generally train the best models (for various values of "best").
We are now witnessing a battle between the creators of generative AI models - who seek to obtain as much data as possible for tokenisation - while their targets try to stop them. The social ramifications of this resource conflict are widespread, resulting in "alateral damage" - a term I am coining to point to the unforeseen, unintended, distal consequences of a seemingly innocuous technology.
These are the Token Wars.
And they're the reason not everything should be open.
TOKENS, TECHNICALLY: A technical grounding on transformers, tokens and how they are used to build generative AI
In this part of the presentation, Kathy will provide a technical grounding on generative AI, how the transformer architecture works, and in particular the attention mechanism. She will briefly cover the concept of tokenisation for data input into transformer models, and explain how transformer models are "next token predictors".
TOKENOMICS: Why are tokens so valuable?
Moving from a technology to its social impact, Kathy turns attention (hah! pun!) to the economics of Tokens, drawing on Elinor Ostrom's work on Governing the Commons. She shows how tokens are a resource that is rivalrous, pointing to the Token Crisis and Model Collapse to lay out her argument. She explores the actors in the Token Wars, explaining their intentions, actions - and the unintended consequences - alateral damage - they are having.
TOKEN TACTICS: Guarding your token treasure - and why not everything should be open.
In the final part of the presentation, Kathy will examine tokens - and data - as a form of capital - outlining ways we may be able to protect our tokens as treasure. Further, she will show how some forms of data - rare, precious and therefore highly valuable for tokenisation - should be strongly protected for cultural, societal and historical reasons - and questions who gets to hoard token treasure.
To conclude the presentation, Kathy will lay out some possible Token Futures - what might we expect to see in this space over the next 12-18 months?
In recent years, there has been an explosion in generative AI. Most of us are now familiar with tools like ChatGPT, Midjourney, Sora, and others. At the heart of generative AI is a machine learning architecture called the "transformer", which is fed by huge datasets - text, images and videos. Those datasets are "tokenised" - cut up into chunks which the transformer can ingest. Those actors who can obtain the most tokens can generally train the best models (for various values of "best").
We are now witnessing a battle between the creators of generative AI models - who seek to obtain as much data as possible for tokenisation - while their targets try to stop them. The social ramifications of this resource conflict are widespread, resulting in "alateral damage" - a term I am coining to point to the unforeseen, unintended, distal consequences of a seemingly innocuous technology.
These are the Token Wars.
And they're the reason not everything should be open.
TOKENS, TECHNICALLY: A technical grounding on transformers, tokens and how they are used to build generative AIIn this part of the presentation, Kathy will provide a technical grounding on generative AI, how the transformer architecture works, and in particular the attention mechanism. She will briefly cover the concept of tokenisation for data input into transformer models, and explain how transformer models are "next token predictors".
TOKENOMICS: Why are tokens so valuable?Moving from a technology to its social impact, Kathy turns attention (hah! pun!) to the economics of Tokens, drawing on Elinor Ostrom's work on Governing the Commons. She shows how tokens are a resource that is rivalrous, pointing to the Token Crisis and Model Collapse to lay out her argument. She explores the actors in the Token Wars, explaining their intentions, actions - and the unintended consequences - alateral damage - they are having.
TOKEN TACTICS: Guarding your token treasure - and why not everything should be open.In the final part of the presentation, Kathy will examine tokens - and data - as a form of capital - outlining ways we may be able to protect our tokens as treasure. Further, she will show how some forms of data - rare, precious and therefore highly valuable for tokenisation - should be strongly protected for cultural, societal and historical reasons - and questions who gets to hoard token treasure.
To conclude the presentation, Kathy will lay out some possible Token Futures - what might we expect to see in this space over the next 12-18 months?