Abstract: In this article, I present the methodology for creating definitions widely used in Librarianship for semantically optimizing websites, Semantic SEO, as an essential aid in solving one of the biggest problems of search engines on the Web: ambiguity.
Why create better definitions for our entities?
The main problem that automated information retrieval systems (web browsers are in this category) face is having a high degree of certainty about the information the user wants.
When I ask, “Where can I find a cougar?” Google, for example, generates a search like this:
It is clear that due to my background — which does not own a car and does not search for it — the algorithm has more confidence in the entity “cougar” being related to the sports brand, less for cars, and none for the animal, also known as a puma.
So it was done, what in Information Sciences, we call disambiguation.
Wikipedia defines disambiguation like this:
In linguistics, disambiguation refers to the process of explaining a message that has more than one meaning. An ambiguous term is, then, one that carries a confusing message or instruction and can be interpreted in more than one way.Brazilian Wikipedia
I already wrote about it in the blog post: The importance of disambiguation in Semantic SEO.
Marcelo Schiessl and Marisa Brascher, in Ontology: ambiguity and Precision, state that ambiguity is a significant obstacle to information retrieval.
Therefore, any help we can give our algorithmic reader (search engines, mainly) helps in this process of escaping from uncertainty, making it more straightforward for the retrieval tools which subject we want to know.
The definition of entities
The process of defining entities, known as terminological definition, fits perfectly in our work as it helps us create a text stating that “accounts for the meanings of terms or expressions of a technique, technology, or science .” 1, it helps us deal with the specific terms of an area of knowledge we are dealing with (related to its semantic field) and their meanings.
To define is to use the correct expression of specialized knowledge or, as FINATTO (2022) states: Much specialized knowledge. In this case, this statement (another name that we can use, in this case, for definition) is, also according to FINATTO (2022), a particular conceptual representation linked to technical, scientific, or technological knowledge.
This use of definitions caught my attention and made me see their importance in creating optimized content, especially for Semantic SEO, which, for obvious reasons, needs to be specific and extremely clear about the entities it is dealing with.
How does definition creation theory help SEO?
To use the techniques of creating terms in SEO, I chose, in my professional practice, to appropriate the studies of Terminology after reading this definition:
Terminology is a discipline that makes it possible to systematically identify the vocabulary of a given specialty, analyze this vocabulary and, if necessary, create and standardize it in a specific operational situation to respond to the expression needs of users.DUBUC, 1999, p. 21-22
One of the most complex activities when starting a Semantic SEO project is understanding the concepts, terms, and entities of the knowledge domain to which the site belongs.
I’ve extensively studied the world of customer service, electric cars, solar energy, and bespoke frames, among others. Learning about a subject, you know very little about is complex, time-consuming, and cannot guarantee success.
But when I got to know Terminology studies, which can be defined as a discipline of Linguistics that studies the form and meaning of words that are part of the existing set of words in a given language, I understood that I could use its methodologies for my work.
I understood the possibilities by studying the Communicative Theory of Terminology (TCT), which emerged in the 80s and 90s, proposed by Maria Teresa Cabré and collaborators. This new view on the study of terms is perfect for SEO because of this particular work structure:
TERM > Creation of a set of words by speakers specialized in a given subject.
Using vocabulary used by specialists in a given subject helps me solve my research problem in SEO projects by giving me a highly qualified starting point of terms, concepts, and entities to build my Semantic SEO Workflow.
The Communicative Theory of Terminology uses a method of creating definitions that fit perfectly in my work, called the Semasiological Method, which starts with words to seek their meanings. This allows me to use a set of texts written by experts to create a Corpus, a group of works of all kinds written by experts on a given subject.
For example, a corpus about the world of Harry Potter can be a Wiki written by fans, researchers, and amateur experts on books and films.
In addition, the TCT is concerned with Terminological Variation, something familiar in any area, because variation occurs all the time, either because experts update the definition of terms over time or because they use different words to define the same entity.
In Medicine, what we know today as Leprosy, has already been called Leprosy and Mal de Lázaro. Or in studies related to agricultural production, cassava is also called cassava, cassava, maniva, castelinha, uaipi, maniveira, pão-de-pobre (poor’s bread), etc.
Thus, understanding what a term is and the existence of terminological variants will enable the indexer to represent the information better. This will allow easier access for different users of an information retrieval system (IRS).Profª Drª Rita do Carmo Ferreira Laipelt e Profª. Drª. Regina Helena Van der Laan
Creating Definitions for Semantic SEO
Therefore, the proposal that I bring to our optimization practice is the use of the methodology created by the TCT, but first, let’s recall the type of statement we will use. For this, I will use the terminological definition that Barros (2004) synthesizes like this:
The utterance that describes the semantic-conceptual content of a lexical or terminological unit in the entry position of an entry […] It consists of a synonymic paraphrase that expresses the concept designated by the lexical or terminological unit through other linguistic units [sic]; [sic]; it is a set of information given about the input. (pp. 158-9).Barros (2004)
I want to mention some points of this definition, which are complex but very interesting.
First, the vision of describing the semantic and conceptual content of the entities we use in our projects. Whether these entities are used as categories to structure a project, themes for content, or words used in texts, the concern with the meaning we give to them and the concepts involved in their clear definition is what, for me, is fundamental.
If I optimize a project about electric cars, I first need to know how the industry conceptualizes an electric vehicle. Then I can customize this definition with the particular view of the organization that owns the site. This gives me the necessary personality to differentiate my content from others already created.
The second point is the synonymic paraphrase. Synonymy (synonyms) occurs between pairs of words or expressions, but it is not just a relationship of meanings; for two expressions to be synonymous, it is not enough that they have the same reference in the world. Bringing it to Semantic SEO terms, I need more than just that I’m talking about the same entity.
For this relationship to happen, the expressions you use in your content must have the same meaning and talk about the same entity. So be semantically objective.
I am going back to my example about electric cars. If I’m talking about the new BMW i4, I can use variations in my content to talk about this entity (BMW i4), but the sentences I create need to refer to the same set of facts and be both true.
If I say that the i4 is an electric car created by BMW in one sentence, in another, I can talk about the year’s launch from the German factory BMW is the first electric Gran Coupé— both true facts about the same entity.
But we are not going to use every type of definition in SEO; the kind of definition we are interested in is the one used in terminological dictionaries:
Terminological definitions bring predominantly formal knowledge about “things” or phenomena.(FINATTO, 1998, p. 2)
The terminological definition has the following characteristics:
- Adequacy to the domain;
- Formal structure and conceptual organization of the definitional statement;
- Close gender + specific differences*
I’ve written many times on the Semantic Blog about Domain (knowledge domain, semantic domain). Linguistic Anthropology has one approach, lexicography another, but the vision of the Social Sciences helps us.
Close gender + specific differences
But before continuing, I need to emphasize the importance of item 3: close gender + specific differences. It works as a guide to start defining our concepts, terms, and entities. Let’s see an example: Let’s see an example:
“An electric car is a motor vehicle powered by electric energy.”
In this example, “motor vehicle” represents the close gender the closest gender (and higher in the hierarchy) to an electric car. If we think of a taxonomy of vehicles, we might have something like this:
- motor vehicle
- Electric vehicles (or electric cars).
“An electric car is a motor vehicle powered by electric energy.”
The expression “powered by electric energy” is the specific difference of an electric car, a unique feature of this particular type of car.
And how does it help us?
What is the advantage of using this pattern in creating optimized texts?
Currently, as I learned from Teodora Petkova, we have two audiences for our content:
Algorithms and humans.
At the beginning of the text, I unambiguously define the entities with which we are working. Of course, I can use various strategies to generate these definitions for the text to be easy for humans to read. Still, regardless, the content starts by clearly defining what I’m talking about at the entity level.
I want to remind you that we are building a semantic content layer for the Web, so let’s talk briefly about the Semantic Web.
Semantic Web, Structured Data, and SEO
A web, to be semantic, needs structured data. In addition to creating good definitions, using structured data on your site will help your algorithmic audience even more. We know that research in semantics, adapted to web technologies, has advanced to resolve ambiguity, but this is still a work in progress.
Search engines like Google and Bing use technology to solve this problem. Can you imagine if your content helps search engines resolve the search ambiguity for your knowledge domain?
I propose to use the methodology of the Communicative Theory of Terminology to create formal definitions for your concepts, terms, and entities. These definitions can and should, be stored in your taxonomy (or you can create a Thesaurus for your content), making this valuable documentation for everyone who creates content in your organization.
In summary, the proposal is as follows:
Start the descriptions with formal definitions of the entities because this helps the algorithm understand what we are discussing.
A standard for creating texts for entities and categories does not mean texts without creativity, but it helps us not to escape our semantic field.
Serve our readers
Although we think of search engines, our most important readers are people who also need good definitions.
- FINATTO, M. J. B. The role of defining technical-scientific terms. lume.ufrgs.br, 2002. Available at http://hdl.handle.net/10183/184253. Accessed on: 4 Feb. from 2023
- RODRIGUES, Daniel de Sá. Terminological definition: principles and rules. periodicos.ufpa.br, 2020. Revista Moara, n. 55, jan-jul 2020. Available at http://dx.doi.org/10.18542/moara.v0i55.9040. Accessed on: 4 de feb. from 2023