Token
In the field of Artificial Intelligence and Natural Language Processing ( NLP a token is the fundamental unit of text that a language model uses to process and understand information . Before an AI can “read” or “write” text, it first goes through a process called tokenization , in which the raw text sequence is segmented into smaller, more manageable pieces. These pieces are the tokens. Contrary to common belief, a token is not necessarily an entire word; it can be a word, a character, a punctuation mark
The main characteristic of modern AI systems, such as Gemini , Claude , and GPT, is the use of subword tokenization. Instead of treating each word as a single item, this approach breaks words down into frequently used components of meaning. For example, the word "restart" could be broken down into the tokens "re", "start", and "ar". This technique is extremely efficient because it allows the model to handle a vocabulary , including rare words, neologisms, or typos, while maintaining a manageable, fixed-size token dictionary. This process allows the AI to recognize morphological relationships between words (such as the relationship between "run", "running", and "race").
For large-scale language models ( LLMs ), tokens are the currency for information processing. After tokenization, each token is converted into a representation (an embedding ) that the machine can use to perform mathematical calculations. It is through the analysis of the relationships between these vectors that the model learns patterns, context , nuances, and the very semantics of the language, allowing it to perform tasks such as answering questions, translating languages, summarizing long texts, and generating content . The way a text is divided into tokens is defined by the specific "tokenizer" of each model.
Sources:
- Google AI for Developers. Introduction to large language models > Tokens. Available at: https://ai.google.dev/docs/llm_tutorial . Accessed on: September 26, 2025.
- Hugging Face. What is a tokenizer? Available at: https://huggingface.co/docs/transformers/main/en/tokenizer_summary . Accessed on: September 26, 2025.
- OpenAI. What are tokens and how to count them? Available at: https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them . Accessed on: September 26, 2025.
- Stanford University. Tokenization | CS224N: Natural Language Processing with Deep Learning. Available at: https://web.stanford.edu/class/cs224n/readings/cs224n-2019-notes02-wordvecs1.pdf . Accessed on: September 26, 2025.




Post comment