The Invisible Cost of Disorder

The Invisible Cost of Disorder: How Information Architecture Prevents SEO Losses

Esse artigo pode ser lido em: Portuguese (Brazil)

In this article, I will write about a subject that, at first glance, seems distant from SEO. We customarily discuss technical matters, indexing, algorithm updates, and more recently AI. But it is about another type of IA that I want to address here: Information Architecture. However, we will approach it from a new perspective: come with me!

In more complex corporate ecosystems and increasingly digitalized business environments, information architecture should not be viewed as a simple organizational layer in interface development. In fact, it is a strategic force against losses exceeding millions, generated by the continuous loss of productivity.

This phenomenon occurs when employees in organizations cannot retrieve information vital to their work or when customers cannot find answers to their queries and searches in the search tools of these organizations’ websites.

Specialists in User Experience, Information Architecture, and Semantic SEO operate at the exact intersection between human cognition and the increasingly intricate data infrastructure of the Web. It is in this scenario that categorization must act as a primary reducer of cognitive load and, simultaneously, as the great engine that drives search systems toward semantics, whether the internal search of a portal or the indexing performed by current search engines.

When data is not organized in a logical structure, Natural Language Processing suffers, algorithms fail to understand the meaning of content, and the business’s organic visibility goes from bad to worse.

Fundamentals of Categorization: The Science of Organizing Objects and Entities

Categorization and information architecture are the strategies you seek to increase digital findability and avoid the invisible cost of disorder.

Want to read more about strategies and tactics in SEO?

Introduction to Classification Logic in the Digital Context

Categorization is the pillar that underpins human cognition, the foundation that allows our brain to process massive volumes of information by grouping entities by similarity and meticulously distinguishing their dissimilarities. Inside your skull, you have the best categorization machine ever invented.

In digital environments and in Information Science itself, this organizational logic is what separates an intuitive and enriching user journey from absolute informational chaos.

For the information architect and the SEO professional, organizing information means mapping the user’s mental model in order to reduce the effort of choice, transforming raw data into structured and quickly retrievable assets, with effectiveness and efficiency.

When dealing with modern search algorithms, such as BERT, the machine needs to understand which “entity” a piece of content belongs to in order to deliver it as the best answer to a query. Without efficient classification logic, the content produced becomes invisible and loses its value.

Analysis of Concept Types and Attributes

Have you heard of the NISO Z39.19 guidelines?

The ANSI/NISO Z39.19-2005 (R2010) guidelines establish essential standards for the construction, formatting, and management of monolingual controlled vocabularies, including thesauri, lists, synonym rings, and taxonomies. The focus of the guidelines is the consistent representation of content objects to facilitate information retrieval in knowledge systems.

A curiosity: did you know you can use NISO Z39.19-2005 as a basis for building the new darlings of AI tools, ontologies? Access this article at cip.brapci.inf.br/download/135118 and read how to do it.

Returning to our conversation about information organization: we know that structuring a competent and optimized database requires the correct identification of attributes and classes, which enables the implementation of multidimensional faceted searches, a vital functionality for extensive catalogs such as those we see in e-commerce.

Based on the standardized guidelines from NISO (National Information Standards Organization), I have summarized the seven essential concept types that we, who work with information representation, need to know to structure any taxonomy:

  • Things: refer to physical objects, tangible entities, and their constituent parts. In e-commerce, it can be a “notebook” or a “processor.”
  • Materials: substances from which things are formed. For example, specifications such as “aluminum,” “glass,” or “silicon.”
  • Activities: processes, actions, or operations performed. In the web environment, they represent interactions such as “buy,” “review,” “compare,” and “share.”
  • Events: occurrences or phenomena situated in time, such as “Black Friday,” “SEO Course,” or “Campaign Launch.”
  • Properties: characteristics, states, or qualities inherent to an object. It can be size, primary color, exact weight, or storage capacity.
  • Disciplines: areas of study or broad branches of knowledge. Here, comprehensive thematic categories enter, such as “Library Science,” “Software Engineering,” and “Digital Marketing.”
  • Measures: units of dimension, scale, or quantity, such as “centimeters,” “gigabytes,” “kilometers,” or financial currencies.

Notice that, with these seven categories, you can already organize all the information in a product catalog. Work together with your systems or software development team and you will be able to create a cutting-edge search or product suggestion system.

Similarity and Dissimilarity Criteria: The Strategic Impact on Data Retrieval

Before moving forward, I need to address these two concepts from the perspective of information science. We need to understand that the concepts of similarity and dissimilarity are part of the foundations of information organization, retrieval, and representation. They are not merely subjective perceptions, but a practical way that allows systems (human or artificial) to identify relationships between documents, terms, or entities.

So I present a technical and reflective definition of these two concepts:

Similarity

Similarity is the degree of correspondence, proximity, or affinity between two informational objects. In Information Science, it is frequently treated from two perspectives:

  • Structural similarity: focuses on the form or physical occurrence of elements (e.g., two articles that share the same keywords).
  • Semantic similarity: focuses on meaning. It occurs when two terms or documents address the same concept, even if they use different languages or terms (synonymy).

Mathematically, similarity is frequently calculated in a vector space (does it remind you of how AI models work?), in which documents are represented by vectors. The most common metric is cosine similarity, which measures the angle between two vectors:

sim(A,B)=cos(θ)=ABAB\text{sim}(A, B) = \cos(\theta) = \frac{A \cdot B}{\|A\| \|B\|}

The closer to 1, the greater the similarity between objects.

Dissimilarity

Dissimilarity is the measure of distance, difference, or divergence between objects. In practice, it is the inverse of similarity, but it has a fundamental strategic value in categorization and classification.

While similarity groups, dissimilarity separates, and is essential for:

  • Avoiding redundancy: in search systems, showing very similar results can be inefficient; dissimilarity helps ensure diversity of results.
  • Outlier identification: detecting information that does not fit any established pattern.

In metric terms, dissimilarity is frequently expressed as a “distance.” Euclidean distance is one of the ways to calculate this divergence:

d(x,y)=i=1n(xiyi)2d(x, y) = \sqrt{\sum_{i=1}^{n} (x_i – y_i)^2}

The dialectic between similarity and dissimilarity enables the creation of taxonomies and ontologies.

  1. Clustering: objects with high internal similarity and high external dissimilarity form a solid class or category.
  2. Cognitive load: efficient information organization uses these concepts to reduce the user’s mental effort. When the similarity between menu options is very high (ambiguity), cognitive load increases, as the user cannot distinguish the correct route.
  3. Information retrieval: modern search engines use natural language processing (NLP) and large-scale language models to refine this perception, going beyond simple word counting to understand “conceptual proximity.”

Point of reflection: in information science, nothing is “equal,” only “highly similar.” Absolute identity is rare; we always work with degrees of approximation that define the relevance of an answer to a query.

Forming Groups by Similarity or Dissimilarity

The formation of logical, semantic, and functional groups depends strictly on the distinction between intrinsic characteristics (what the object or entity actually is in its ontology) and extrinsic characteristics (how it is used, perceived, or applied by the end user). The failure to define and isolate these criteria generates immense algorithmic “noise,” harming crawlers and severely degrading search precision.

But let us clarify these complicated concepts.

I often say that an e-commerce without a clear distinction between what a product is and what a product represents is merely a digital warehouse, not a sales strategy.

Let us use an example of a highly sought-after thermal bottle (such as a Stanley or similar). Imagine we are organizing the taxonomy and semantics of this store:

The Intrinsic View (The Ontology of the Object)

Here, the group is formed by the similarity of what the object actually is. It does not matter who buys it or for what purpose.

  • Characteristics: stainless steel, vacuum insulation, 500 ml capacity, screw-on lid.
  • Logical grouping: kitchen > containers > thermal bottles.

The Extrinsic View (User Perception and Application)

Here, physical dissimilarity is ignored in favor of functional similarity. The group is formed by context.

  • Scenario A (the camping enthusiast): the bottle is grouped with tents, sleeping bags, and lanterns. It does not “look like” a tent, but it serves the same extrinsic purpose: survival and outdoor comfort.
  • Scenario B (the office professional): the bottle is grouped with planners, desk organizers, and ergonomic mice. Here, it is a productivity and status accessory.

In my professional experience, I have noticed that the common mistake is trying to force the user to think only in ontology (intrinsic). If your customer wants “gifts for adventurous fathers,” they do not want to navigate through “stainless steel > 500 ml.”

If the information architecture does not reflect this cognitive load of the user, who searches by use and not by raw material, the internal search system fails, NLP cannot connect the dots, and conversion plummets. Extrinsic similarity is what generates desire; intrinsic similarity is what validates the technical purchase.

Therefore, strategically, if the categorized attributes are ambiguous, the query to the database or a search engine will return out-of-context results, forcing the visitor into manual and extremely exhaustive filtering, which invariably increases the bounce rate.

The rigorous identification of those NISO attributes is one of the factors that allow a faceted navigation system to perfectly differentiate “Activities” from “Disciplines” in side navigation filters.

The Role of Entity Extraction in Content

Another indispensable point in these fundamentals is “Entity Extraction.” In Semantic SEO, we must constantly identify entities (people, places, organizations, concepts) present in the full text of a document and ensure they align with the concepts identified in the knowledge domain analysis and the taxonomy created for the site.

By applying Natural Language Processing routines, we make precise inferences about these entities, consolidating the semantic domain of the page and attesting to the search engine that our content is a solid authority in this field of knowledge. Semantic SEO was already using AI in SEO long before this subject became fashionable.


This is the first part of the article addressing how information architecture is important for SEO projects. The second part will be published soon.


Curso SEO Semantico Semantico SEO

Olá, eu sou o Alexander Rodrigues Silva, especialista SEO e autor do livro "SEO Semântico: Fluxo de trabalho semântico". Atuo há mais de duas décadas no universo digital, com foco em otimização de sites desde 2009. Minhas escolhas me levaram a aprofundar na intersecção entre experiência do usuário e estratégias de marketing de conteúdo, sempre com o foco no aumento do tráfego orgânico no longo prazo.Minhas pesquisas e especialização se concentram no SEO Semântico, onde investigo e aplico a semântica e os dados conectados na otimização de websites. É um campo fascinante que me permite unir minha formação em publicidade com a biblioteconomia.Nesta minha segunda graduação, em Biblioteconomia e Ciência da Informação, busco expandir meus conhecimentos em Indexação, Classificação e Categorização da Informação, por ver uma conexão intrínseca e de grande aplicação desses conceitos ao trabalho de SEO. Tenho pesquisado e conectado ferramentas da Biblioteconomia (como Análise de Domínio, Vocabulário Controlado, Taxonomias e Ontologias) com as novas ferramentas da Inteligência Artificial (AI) e os Modelos de linguagem de grande escala (LLMs), explorando desde Grafos de Conhecimento até o papel dos Agentes autônomos.No meu papel de consultor em SEO, busco trazer uma nova perspectiva para a otimização, integrando a visão de longo prazo, a engenharia de conteúdo e as possibilidades que a inteligência artificial oferece. Para mim, o trabalho de SEO é uma estratégia que precisa estar alinhada com os objetivos do seu negócio, mas que exige um conhecimento profundo sobre o funcionamento dos motores de busca e uma capacidade de entender os resultados da pesquisa.

Post Comment

Blog Semântico
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.