As AI systems advance, a trend has emerged: their representations of data across different architectures, training objectives, and modalities seem to be converging. This convergence has practical implications for AI solutions.

Key Findings

  • Modern large language models (LLMs) demonstrate remarkable versatility, competently handling multiple language processing tasks using a single set of weights.
  • Representations in deep neural networks are converging toward a common representation of reality, evident across different model architectures, training objectives, and data modalities.
  • Factors contributing to this convergence include task generality, model capacity, and simplicity bias in deep neural networks.


  • Scaling models in terms of parameters and data could lead to more accurate representations of reality, potentially reducing hallucination and bias.
  • Training data from different modalities could be shared to improve representations across domains.


  • Different modalities may contain unique information that cannot be fully captured by a shared representation.
  • Convergence observed so far is primarily limited to vision and language, with other domains exhibiting less standardization in representing world states.

