Abstract: In this article, I present the methodology for creating definitions widely used in Librarianship for semantically optimizing websites, Semantic SEO, as an essential aid in solving one of the biggest problems of search engines on the Web: ambiguity.
Conteúdos
Why create better definitions for our entities?
The main problem that automated information retrieval systems (web browsers are in this category) face is having a high degree of certainty about the information the user wants.
When I ask, “Where can I find Puma?” Google, for example, generates a search like this:
It is clear that due to my background — which does not own a car and does not search for it — the algorithm has more confidence in the entity “cougar” being related to the sports brand, less for cars, and none for the animal puma.
So it was done, what in Information Sciences, we call disambiguation.
Wikipedia defines disambiguation like this:
In linguistics, disambiguation refers to the process of explaining a message that has more than one meaning. An ambiguous term is, then, one that carries a confusing message or instruction and can be interpreted in more than one way.
Brazilian Wikipedia
I already wrote about it in the blog post: The importance of disambiguation in Semantic SEO.
Another example of an optimized result, this time using voice search:
From my perspective of what Semantic SEO is, the definition I created generated this result.
Marcelo Schiessl and Marisa Brascher, in Ontology: ambiguity and Precision, state that ambiguity is a significant obstacle to information retrieval.
Therefore, any help we can give our algorithmic reader (search engines, mainly) helps in this process of escaping from uncertainty, making it more straightforward for the retrieval tools which subject we want to know.
And one of the simplest ways to do that is to deal with precise definitions of the concepts, terms, and entities in our texts.
The definition of entities
The process of defining entities known as terminological definition fits perfectly in our work for helping us in the creation of a text stating that “accounts for the meanings of terms or expressions of a technique, technology, or science.”1, that is, it helps us to deal with the specific terms of an area of knowledge we are dealing with (related to its semantic field) and their meanings.
An entity is anything, concrete or abstract, including associations between entities, abstracted from the real world and modeled as a table that will store information in the database.
BRAZILIAN WIKIPEDIA
Entities are uniquely identifiable objects and people with individual properties (such as color = red, date of birth = August 28, 1749, altitude = 2962 meters, temperature = —4.5 degrees). — ENTITY (IT)
To define is to use the correct expression of specialized knowledge or, as FINATTO (2022) states:
A portion of specialized knowledge. In this case, this statement (another name that we can use, in this case, for definition) is, also according to FINATTO (2022), a particular conceptual representation linked to technical, scientific, or technological knowledge.
This use of definitions caught my attention and made me see their importance in creating optimized content, especially for Semantic SEO, which, for obvious reasons, needs to be specific and extremely clear about the entities it is dealing with.
How does definition creation theory help SEO?
To use the techniques of creating terms in SEO, I chose, in my professional practice, to appropriate the studies of Terminology after reading this definition:
Terminology is a discipline that makes it possible to systematically identify the vocabulary of a given specialty, analyze this vocabulary and, if necessary, create and standardize it in a specific operational situation to respond to the expression needs of users.
DUBUC, 1999, p. 21–22
One of the most complex activities when starting a Semantic SEO project is understanding the concepts and, consequently, the terms and entities of the knowledge domain to which the site belongs.
I’ve extensively studied the world of customer service, electric cars, solar energy, and bespoke frames, among others. Learning about a subject you know nothing about is complex, time-consuming, and cannot guarantee success.
But when I got to know Terminology studies, which can be defined as a discipline of Linguistics that studies the form and meaning of words that are part of the existing set of words in a given language, I understood that I could use its methodologies for my work.
I understood the possibilities by studying more about the Communicative Theory of Terminology (TCT), which emerged in the 80s and 90s, proposed by Maria Teresa Cabré and collaborators. This new view on the study of terms is perfect for SEO because of this particular work structure:
TERM > Creation of the set of words by speakers specialized in a particular subject.
Using vocabulary used by specialists in a given subject helps me solve my research problem in SEO projects by giving me a highly qualified starting point of terms, concepts, and entities with which I can build my Semantic Workflow.
The Communicative Theory of Terminology uses a method of creating definitions that fit perfectly in my work, called the Semasiological Method, which starts with words to seek their meanings. This allows me to use a set of texts written by experts to create what we call a Corpus, a collection of works of all kinds written by experts on a given subject.
For example, a corpus about the world of Harry Potter can be a Wiki written by fans, researchers, and amateur experts on books and films.
In addition, the TCT is concerned with Terminological Variation, something familiar in any area, because variation constantly occurs because experts update the definition of terms over time or because they use different words to define the same entity.
In Medicine, what we know today as Leprosy, has already been called Leprosy and Mal de Lázaro. Or in studies related to agricultural production, cassava is also called cassava, cassava, maniva, castelinha, uaipi, maniveira, poor bread, etc.
I would like to mention here the professors Prof. Dr. Rita do Carmo Ferreira Laipelt and Profª. Dr. Regina Helena Van der Laan, when she talks about the importance of Terminology in Indexing:
Thus, understanding what a term is and the existence of terminological variants will enable the indexer to represent the information better. This will allow easier access for different users of an information retrieval system (IRS).
Prof. Dr. Rita do Carmo Ferreira Laipelt and Prof. Dr. Regina Helena Van der Laan
Creating definitions for Semantic SEO
Therefore, the proposal that I bring to our optimization practice is the use of the methodology created by the TCT, but first, let’s recall the type of statement we will use. For this, I will use the terminological definition that Barros (2004) synthesizes like this:
The utterance that describes the semantic-conceptual content of a lexical or terminological unit in the entry position of an entry […] It consists of a synonymic paraphrase that expresses the concept designated by the lexical or terminological unit through other linguistic units [sic]; is a set of information that is given about the input. (pp. 158–9).
Barros (2004)
I want to mention some points of this definition, which are pretty complex but of great interest to us.
First, the vision of describing the semantic and conceptual content of the entities we use in our projects. Whether these entities are used as categories to structure a project, themes for content, or words used in texts, the concern with the meaning we give to them and the concepts involved in their clear definition is what, for me, is fundamental.
If I’m optimizing a project about electric cars, I first need to know how the industry conceptualizes an electric vehicle. Then I can customize this definition with the particular view of the organization that owns the site. This gives me the necessary personality to differentiate my content from others already created.
The second point is the synonymic paraphrase. Synonymy (synonyms) occurs between pairs of words or expressions, but it is not just a relationship of meanings; for two expressions to be synonymous, it is not enough that they have the same reference in the world. Bringing it to Semantic SEO terms, it’s not enough that I’m talking about the same entity.
For this relationship to happen, the expressions you use in your content must have the same meaning, in addition to talking about the same entity. So be semantically objective.
I am going back to my example about electric cars. If I’m talking about the new BMW i4, I can use variations in my content to talk about this entity (BMW i4), but the sentences I create need to refer to the same set of facts and be both true.
If I say that the i4 is an electric car created by BMW in one sentence, in another, I can talk about the year’s launch from the German factory BMW is the first electric Gran Coupé. Both are facts about the same entity.
Entity definitions
But we are not going to use every type of definition in SEO; the kind of definition we are interested in is the one used in terminological dictionaries:
Terminological definitions bring predominantly formal knowledge about “things” or phenomena.
(FINATTO, 1998, p. 2)
The terminological definition has the following characteristics:
- Adequacy to the domain;
- Formal structure and conceptual organization of the definitional statement;
- Close gender + specific difference*
I’ve written many times on the Semantic Blog about Domain (knowledge domain, semantic domain). Linguistic Anthropology has one approach, lexicography another, but the vision of the Social Sciences helps us.
The Search was originally to see how the words groups of humans use to describe certain things are relative to the underlying perceptions and meanings those groups share.
Ethnosemantics became the field centered around studying these semantic domains and, more specifically, how the categorization and context of words and groups of words reflected how different cultures categorize words in discourse and assign meaning—your tongue.
OTTENHEIMER, 2006, P. 18
Close gender + specific differences
But before continuing, I need to emphasize the importance of item 3: close gender + specific differences. It works as a guide to start defining our concepts, terms, and entities. Let’s see an example:
“An electric car is a motor vehicle powered by electric energy.”
In this example, “motor vehicle” represents the close gender, the nearest gender (and higher in the hierarchy) to an electric car. If we think of a taxonomy of vehicles, we might have something like this:
- Vehicles
- motor vehicle
- Electric vehicles (or electric cars).
“An electric car is a motor vehicle powered by electric energy.”
The expression “powered by electric energy” is the specific difference of an electric car, a unique feature of this particular type of car.
And how does it help us?
What is the advantage of using this pattern in creating optimized texts?
Currently, as I learned from Teodora Petkova, we have two audiences for our content:
Algorithms and humans.
I’ve been using creating definitions in my work to deal with a problem that algorithms have to solve: the ambiguity of our language. For that, I follow this pattern:
At the beginning of the text, I define, without ambiguity, the entities with which we are working. Of course, I can use various strategies to generate these definitions for the text not to be hard for humans to read. Still, the content starts by clearly defining what I’m about at the entity level.
I can’t help reminding you that we are building a layer of semantic content for the Web, so let’s talk briefly about Semantic Web.
Semantic Web, Structured Data and SEO
A web, to be semantic, needs structured data. In addition to creating good definitions, using structured data on your site will help your algorithmic audience even more. We know that research in semantics, adapted to web technologies, has advanced to resolve ambiguity, but this is still a work in progress.
Search engines like Google and Bing use technology to solve this problem. Can you imagine if your content helps search engines resolve the search ambiguity for your knowledge domain?
I propose to use the Communicative Theory of Terminology methodology to create formal definitions for your concepts, terms, and entities. These definitions can and should, be stored in your taxonomy (or you can create a Thesaurus for your content), making this valuable documentation for everyone who creates content in your organization.
In summary, the proposal is:
Formal Definitions
Start the descriptions with formal definitions of the entities because that helps the algorithm understand what we are discussing.
Standardization
A text creation standard for entities and categories does not mean texts without creativity, but it helps us not to escape our semantic field.
References
- FINATTO, M. J. B. The role of defining technical-scientific terms – (O papel da definição de termos técnico-científicos.) lume.ufrgs.br, 2002. Available at http://hdl.handle.net/10183/184253. It was accessed on: 4 Feb. from 2023
- RODRIGUES, Daniel de Sá. Terminological definition: principles and rules. (Definição terminológica: princípios e regras.) periodicos.ufpa.br, 2020. Revista Moara, n. 55, jan-jul 2020. Available at http://dx.doi.org/10.18542/moara.v0i55.9040. It was accessed on: 4 Feb. from 2023.