From "morpheme" to "symbol": The underlying AI cognitive debate behind the Chinese name of Token

Recently, the National Committee for the Review of Scientific and Technical Terms issued an announcement recommending translating “Token” in the field of artificial intelligence as “词元” (“word element”) and trialing it publicly. Subsequently, People’s Daily published an article titled “Expert Interpretation: Why Is the Chinese Name for ‘Token’ Set as ‘词元’?” which systematically explained this naming from a professional perspective.

The article mentions that the term “token” originates from Old English tācen, meaning “symbol” or “mark.” In language models, a token is the smallest discrete unit obtained after text segmentation or byte-level encoding, which can take various forms such as words, subwords, affixes, or characters. Models demonstrate certain intelligent capabilities by modeling sequences of tokens.

This translation is considered by experts to conform to principles of univocality, scientificity, conciseness, and harmony, and currently has some basis for use within the Chinese language context. However, after reading related interpretations, I have formed a different understanding of this naming approach.

From a normalization perspective, this naming scheme is understandable and easy to disseminate in the short term. But when viewed from the dimensions of computational ontology, information structure, multimodal evolution, and back-translation consistency, its long-term adaptability remains to be further tested. Against this background, an alternative path—“符元” (“symbol element”)—gradually shows stronger structural consistency and cross-context stability.

  1. Dislocation of definitions: Cannot replace “essence” with “origin”

Expert opinion (Chen Xilin, researcher at the Institute of Computing Technology, Chinese Academy of Sciences): The initial role of “token” in AI is as a “basic semantic unit of language,” so “词元” (“word element”) can better reflect its essence.

This judgment is reasonable in a historical context, but in the current paradigm of rapid technological shifts, this way of thinking is essentially an “academic sticking to the boat.”

In terms of terminology logic, it is necessary to strictly distinguish between “initial application scenarios” and “structural essential attributes.”

While “token” indeed originates from natural language processing (NLP), in the evolution path toward AGI, it has long transcended the boundaries of language models, evolving into a fundamental unit for unified processing of text, images, speech, and even physical signals. In modern computing systems, the true ontological structure of a token is a “discrete symbol unit,” not limited to a single modality language unit.

If named by “initial role,” then computers (Computer) should still be called “electronic calculating hands” (from their original function of replacing human calculators); the Internet should be called “Cold War military network.” The fatal flaw of this naming logic is that it only sees the “temporary job” of technology at a specific historical moment, ignoring its “physical ontology” that spans eras.

Historical paths are not equivalent to essential attributes. Similarly, we cannot permanently confine “token,” originally used for text processing, to the narrow context of “word.”

Defining basic concepts based on “initial application scenarios” essentially replaces the ontological truth of structure with historical path dependence. This kind of definition may facilitate understanding in early technology stages, but during the phase of multimodal explosion and paradigm expansion, it quickly becomes invalid and becomes a barrier to cognition. In contrast, “符元” (“symbol element”) directly aligns with the symbol ontology across modalities, defining not the “past” of Token but its “truth.”

  1. Boundaries of analogy: explanations that become definitions tend to deviate

Expert opinion (Dong Yuxiao, associate professor at Tsinghua University Department of Computer Science): Discrete units in multimodal contexts can be understood as “broadly defined words” through analogies like “word cloud” and “bag of words.”

Dong Yuxiao’s analogy aids understanding but should not replace definitions. This approach is somewhat enlightening at the interpretive level, but if elevated to a basis for naming, it may cause conceptual category misalignments.

Methodologically, analogy reduces comprehension barriers, while the role of definitions is to delineate semantic boundaries. When “word” is extended to cover image patches, speech segments, embedding vectors, or broader perceptual signals, its original language attribute is continually diluted, and semantic boundaries become fuzzy. This “analogy-driven” extension path can maintain interpretive consistency in the short term but risks semantic drift over long-term evolution.

In cross-modal extension capabilities, caution is needed against “analogy” slipping into “definition.” In the context of terminology approval, it is essential to distinguish between “interpretive metaphor” and “ontological definition” to prevent the former from replacing the latter.

A more intuitive comparison is: in popular science, we might call a light bulb a “man-made sun” to enhance intuitive understanding; but in scientific naming systems, it is impossible to rename the unit of electric current “ampere” as “light element.” The former is descriptive, the latter involves strict measurement systems and standard definitions—these cannot be mixed.

Similarly, terms like “word cloud” and “bag of words” are essentially descriptive or statistical metaphors, helping to understand data structures or distribution patterns; whereas Token, as a fundamental measurement unit in large models, is deeply embedded in billing, model training, and academic metrics. When its scale reaches hundreds of billions to trillions of calls per day, its naming carries not just explanatory function but also engineering and standardization significance. At this level, the terminology needs to align with its ontological attributes, not rely on analogy extension.

Extending this analogy further into naming involves a hidden dangerous premise: since people are accustomed to understanding Token as “word,” they might continue to use this analogy. But this is actually a path dependence—using the convenience of existing cognition to substitute for correction of the concept’s ontological nature. In this sense, such naming is closer to a “linguistic romanticism” rather than a strict alignment with the computational ontology.

We cannot demand “electronic horse” in motors just because “horsepower” contains “horse.” Analogies can inspire understanding but cannot set standards.

In contrast, “符” (“symbol”) as a more neutral concept inherently possesses cross-modal adaptability, capable of covering text, images, speech, and other information forms without additional explanation. Naming based on “symbol unit” aligns more closely with the structural essence of Token. Under this logic, “符元” (“symbol element”) as a translation has higher conceptual consistency and long-term adaptability.

  1. Cognitive costs: when semantic anchors create systematic misunderstandings

Expert opinion (comprehensive expert views): “词元” (“word element”) is concise, conforms to Chinese habits, and is easy to disseminate.

This judgment has some validity in terms of dissemination, but its implicit premise is that the public can accept the cross-modal analogy of “word.” However, analogy is fundamentally a tool for expert thinking, not a natural way of cognition for the general public. For ordinary users, “word” has a very strong semantic anchoring effect—once they hear “word,” their intuitive association is inevitably with language systems, not images, sounds, or actions. This cognitive pathway is not a technical issue but a stable structure in cognitive psychology.

Based on this, when “word” is extended to the so-called “broadly defined word,” it actually introduces bias into user cognition. Users first form the intuitive understanding that “word = language unit,” rather than the abstract concept of “cross-modal symbol unit.” Once this misunderstanding is established, all subsequent explanations become corrections of existing cognition rather than natural extensions.

For example, when media reports say “the model was trained on 10 trillion tokens,” the public is likely to interpret this as “reading a large amount of text,” overlooking the large amount of images, speech, and other modal data involved. This misunderstanding is not an isolated case but a systemic effect caused by the semantic anchoring of the term itself.

In practical engineering contexts, such naming may also cause friction in interdisciplinary communication. Calling discrete units in visual or speech models “words” not only risks semantic misunderstanding but also creates unnecessary linguistic conflicts across fields. Multimodal systems need a “symbol layer” for unification, not an extension of language categories.

Compared to “符” (“symbol”), which is more abstract, although its initial understanding barrier is slightly higher, its semantic reference is more neutral, avoiding pre-locked language cognition. In long-term use, it is more conducive to establishing a stable, unified cognitive framework, reducing overall interpretive costs and providing a more stable cognitive basis for multimodal unification.

The cost of naming does not occur at the definition stage but at the correction stage; once early naming forms a semantic anchor, subsequent cognitive correction costs increase exponentially.

Experts can extend the boundary of “word” through analogy, but the public will not understand the concept via analogy. Naming is not for experts but for the entire cognitive system of the era.

  1. The illusion of univocality: when one word attempts to carry two systems

Expert opinion (principles of noun approval): “词元” (“word element”) conforms to the principle of univocality and helps solve the confusion caused by multiple translations.

Regarding terminological univocality, special attention must be paid to the systemic risks of “one word, two meanings.” In scientific nomenclature approval, “univocality” is one of the fundamental principles. If a term requires contextual or additional explanation to distinguish meanings, its value as a standard component is compromised.

However, from the current academic system, this judgment still leaves room for further discussion. The term “词元” (“word element”) has long been “famous” in linguistics and NLP; in classical linguistics, its long-standing English equivalent is “lemma,” which refers to the canonical form of a word (e.g., the lemma of is/am/are is “be”). This usage has formed a stable consensus in language and NLP foundational textbooks and academic papers.

In this context, if “Token” is also translated as “词元,” it can easily cause semantic conflicts and lead to disastrous on-site misunderstandings.

For example, in describing “lemmatization of a token” in NLP, the Chinese expression would be “对‘词元’进行‘词元化’” (“lemmatize the ‘word element’”). This expression not only increases comprehension difficulty but also introduces ambiguity in academic writing and information retrieval, making it hard for readers to distinguish whether “词元” refers to the segmented discrete unit or the lemma (canonical form).

Conceptually, there is a clear distinction: Lemma emphasizes “restoration” at the language level, corresponding to the normalized form after morphological changes; while Token emphasizes “segmentation” in the computational process, representing the minimal discrete unit processed by models. This “restoration” versus “segmentation” difference corresponds to different dimensions of semantics and symbols.

Therefore, when a term needs to be “broadly” used to cover multiple existing concepts, its univocality effectively becomes a “semantic-level unification,” not a “semantic stability.”

In contrast, “符元” (“symbol element”) does not have semantic conflicts in the current terminology system. On one hand, it retains the ontological attribute of Token as a discrete symbol; on the other hand, it avoids overlapping with the existing translation of Lemma, thus demonstrating higher clarity and systemic consistency.

  1. Ontological return: Token is fundamentally a “symbol,” not a “word”

Expert opinion (general explanation): Token is the smallest unit used in language models for text processing.

This statement is valid at the functional level but remains at the “how to use” layer, without touching its ontological attribute in computational theory. From the perspectives of information theory and computational ontology, the basic object processed by computational systems is not “word” but “symbol.”

This can be further understood on two levels:

First, from an information theory perspective, the essence of information is to eliminate uncertainty, measured in bits, and carried by discrete symbols. Symbols do not concern semantic content but are related to probability distributions and coding structures.

Second, at the computational implementation level, large models do not “know characters”; their processing objects are discrete index representations (IDs). Whether this ID corresponds to a Chinese character, an image patch, or an audio sample point, it participates in computation uniformly as a symbol.

Within this framework, the ontological core is at the “symbol layer,” not the “semantic layer.” Symbols themselves do not carry semantics but serve as the basic carriers of encoding and computation.

Naming Token as “词元” (“word element”) introduces an implicit linguistic semantic reference, pulling this originally symbolic concept back into a language-centered understanding path. While this naming may provide interpretive intuition, theoretically it blurs the boundary between “symbolic computation” and “semantic understanding.”

In contrast, “符元” (“symbol element”) conceptually remains within the symbol layer. On one hand, it accurately reflects Token’s ontological attribute as a discrete symbol; on the other hand, it avoids introducing semantic features into the ontological definition, aligning better with the basic frameworks of information theory and computational theory.

From a broader perspective, as AI systems evolve toward multimodality and general intelligence, aligning fundamental concepts with their mathematical and computational ontology will facilitate building stable, scalable cognitive systems. In this sense, a naming path centered on “symbol units” is not only a language choice but also a consistent expression of the computational essence, and “符元” (“symbol element”) is a natural corresponding term within this framework.

Starting from the symbol layer for concept definition aligns with the computational ontology; naming from the semantic layer is more about explanation than definition.

  1. Language rupture: mapping failure in back-translation mechanisms

Expert opinion (comprehensive interpretation): “词元” (“word element”) has gradually formed a usage base in Chinese academia and has certain dissemination advantages.

In cross-language contexts, one must be alert to the systemic impact of “back-translation rupture.” Whether a scientific term has long-term vitality depends not only on its semantic clarity within Chinese but also on its ability to achieve stable mapping in the international academic system. An ideal term should be “reversible,” enabling consistent semantic correspondence across languages.

The above judgment reflects “词元”’s acceptability in the local context but leaves room for further discussion from a cross-lingual perspective. If a term only works within a single language system and cannot form a stable correspondence in the international context, it may introduce additional understanding costs in academic exchanges.

Specifically, “词元” lacks a clear, unique mapping path during back-translation. When translated back into English, it often diverges among several similar concepts: “word unit” lacks a strict academic definition; “morpheme” corresponds to linguistic morphemes; “lexeme” refers to the word’s lemma. None of these precisely cover the meaning of Token in computational contexts, and they may instead cause category shifts.

In contrast, “符元” can more naturally correspond to “symbolic unit.” This concept has a clear theoretical basis and stable usage in information theory, discrete mathematics, and multimodal representation fields, maintaining consistent semantic reference across contexts. Therefore, it is easier to establish a one-to-one mapping between Chinese and English.

Practically, once a term enters academic papers, technical documents, and international communication, its back-translation capability directly affects expression efficiency and understanding accuracy. If a term requires additional explanation for cross-language conversion, its long-term usage costs will accumulate.

Thus, in a cross-lingual system, “词元” faces instability in mapping paths, while “符元” demonstrates higher certainty in semantic correspondence and concept consistency. In the increasingly globalized AI landscape, choosing terms with good back-translation properties will better support building open, interoperable academic and technical systems.

The reversibility of terminology across languages is essentially a key indicator of its long-term academic vitality.

  1. The fallacy of uniformity: visual form does not equal structural consistency

Expert opinion (comprehensive expert): “词元” aligns stylistically with terms like “embedding” and “attention,” being concise and abstract, fitting the Chinese technical terminology context.

Conclusion first: the unification of terminology systems should be based on “concept isomorphism,” not “linguistic similarity.”

In the argument supporting “词元,” a common reason is that its expression style matches “embedding” and “attention,” being concise and abstract, consistent with Chinese technical language. This reason captures the genuine need for system unification but the problem is—if the unification only remains at the linguistic level and not at the structural level, it shifts from “order” to “illusion.”

“Embedding” and “attention” are stable terms because they correspond to clear computational structures: the former is a vector mapping, the latter a weighting mechanism, with naming directly pointing to the computational essence. “词元,” on the other hand, is interpretive, relying on the analogy of “broadly defined words.” Without this interpretive framework, the name itself lacks a self-consistent structural direction.

This difference leads to a key issue: form consistency does not guarantee semantic consistency.

The former reduces expression costs, the latter ensures cognitive stability. Prioritizing “linguistic form” alone does not eliminate complexity; instead, it transfers it into long-term cognitive burdens. Only naming based on “conceptual isomorphism” can maintain stability across contexts and multimodal evolution.

When “embedding,” “attention,” and “词元” appear side by side, it can create an illusion of “conceptual equivalence.” But in fact, the first two are mechanisms, the last is an object; the first two have precise definitions, the latter depends on context. This structural misalignment can embed latent fractures in the cognitive system.

More importantly, when a fundamental concept’s naming relies on analogy rather than structural definition, its influence extends beyond the single term, affecting the entire terminology system. Subsequent concepts trying to develop around this name will have to rely on explanations to maintain consistency, leading to implicit structural misalignments.

In this sense, “符元” offers a way to express closer to the underlying structure. It directly points to the basic object in the computational system—symbols—without relying on analogy explanations, maintaining consistent understanding across contexts.

Terminology is not just a label but an entry point to cognition. Good terminology gradually eliminates explanations, while poor terminology accumulates annotations. When a foundational concept’s name deviates from its structure, the terminology system can only be maintained through explanations, not through self-consistent definitions.

Conclusion

Essentially, the choice of terminology is not merely a language issue but an early shaping of the domain’s cognitive structure. If naming deviates from its ontological structure at the initial stage, the subsequent system can only be sustained through continuous explanations, making it difficult to form a coherent conceptual network.

As AI advances toward generalization and multimodal integration, terms that align with their computational ontology and maintain cross-context stability are more likely to become long-term cognitive foundations. In this sense, a naming path centered on “symbol units” balances technical essence and cognitive clarity, showing better adaptability.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin