CoSyn: An AI Framework that Leverages the Coding Capabilities of Text-only Large Language Models (LLMs) to Automatically Create Synthetic Text-Rich Multimodal Data

“`html

Challenges in Vision-Language Models

Vision-language models (VLMs) excel in general image understanding but struggle with text-rich visual content such as charts and documents. These images require advanced reasoning that combines text comprehension with spatial awareness, which is essential for analyzing scientific literature and enhancing accessibility features. The main issue is the lack of high-quality training data that accurately represents the variety of text-embedded visuals encountered in real-world applications.

Current Limitations

Existing VLMs often have an imbalance between their language and visual processing capabilities, leading to inaccuracies when high-quality training data is limited. Current benchmarks for text-rich image understanding are insufficient in size and diversity, which hampers comprehensive training. Previous efforts to generate synthetic data have focused on narrow domains, resulting in limited topic diversity and rendering methods.

Introducing CoSyn

A team from the University of Pennsylvania and the Allen Institute for Artificial Intelligence has developed the Code Guided Synthetic Data Generation System (CoSyn). This innovative framework addresses the challenges of processing text-rich images by creating diverse synthetic multimodal training data. CoSyn utilizes text-only large language models (LLMs) to generate both data and rendering code for various visual formats.

How CoSyn Works

CoSyn operates through a four-stage workflow:

Natural Language Query: The process begins with a query, such as “generate a dataset of book covers.”
Pipeline Selection: The system selects from 20 generation pipelines using 11 rendering tools.
Data Generation: It generates detailed content based on the chosen topic.
Code and Instructions: Finally, it generates executable code to render images and corresponding textual instructions.

CoSyn incorporates 200,000 unique personas to enhance content diversity and mitigate repetitive outputs.

Performance Outcomes

The model trained on CoSyn’s synthetic data shows exceptional performance across various benchmarks. It outperforms competing models significantly, even in zero-shot scenarios where no prior training on specific datasets was conducted. This demonstrates the effectiveness of CoSyn’s synthetic data in transferring skills to practical applications.

Conclusion

The CoSyn framework marks a significant advancement in VLM development, utilizing synthetic data to improve performance in text-rich image understanding tasks. By leveraging the capabilities of text-only LLMs, CoSyn generates high-quality training data that enables models to generalize effectively across different domains. This innovation is crucial for developing VLMs capable of handling complex visual content in real-world applications.

Explore Further

Check out the Paper and Dataset here. Follow us on Twitter and join our 80k+ ML SubReddit.

Transform Your Business with AI

Explore how AI can enhance your work processes.
Identify key performance indicators (KPIs) to measure the impact of AI investments.
Select customizable tools that align with your business objectives.
Start with small projects, gather data, and gradually expand AI usage.

For guidance on managing AI in business, contact us at hello@itinai.ru.

Connect with us on Telegram, X, and LinkedIn.

“`

Unleash Your Creative Potential with AI Agents

Competitors are already using AI Agents

Business Problems We Solve

Automation of internal processes.
Optimizing AI costs without huge budgets.
Training staff, developing custom courses for business needs
Integrating AI into client work, automating first lines of contact

Large and Medium Businesses

Startups

Offline Business

Get a plan to reduce routine and improve metrics

100% of clients report increased productivity and reduced operati

AI Agents

Localization Project Manager – Coordinating translation workflows, answering vendor or process-related questions.

Job Title: Localization Project Manager Overview The Localization Project Manager plays a vital role in coordinating translation workflows while addressing vendor and process-related queries. This position is crucial for ensuring that translation projects are executed efficiently…
AI Agents

Environmental Health & Safety Officer – Answering compliance-related questions, retrieving safety protocols or audit histories.

Professional Summary The AI-driven Environmental Health & Safety Officer is a reliable and effective digital team member that performs repetitive and time-consuming tasks with remarkable speed, accuracy, and stability. By automating these tasks, it frees up…
AI Agents

Legal Contract Reviewer – Auto-flagging clause inconsistencies or retrieving precedent cases for review.

Job Title: Legal Contract Reviewer – Auto-flagging Clause Inconsistencies or Retrieving Precedent Cases for Review The AI functions as a reliable and effective digital team member that excels in performing repetitive and time-consuming tasks. With remarkable…
AI Agents

Customer Retention Analyst – Creating customer summaries, identifying churn risk patterns, and suggesting retention steps.

Customer Retention Analyst Professional Summary A highly analytical and detail-oriented Customer Retention Analyst with a proven track record in creating comprehensive customer summaries, identifying churn risk patterns, and suggesting effective retention strategies. Adept at leveraging data-driven…

Itinai.com httpss.mj.runmrqch2uvtvo russian handsome charisma 9fdbb2d5 a55b 425d 8f3b 76d26f86710f 2

AI Business Accelerator

Start Your AI Business in Just a Week with itinai.com

You’re a great fit if you:

Have an audience (even 500+ followers in Instagram, email, etc.)
Have an idea, service, or product you want to scale
Can invest 2–3 hours a day
You’re motivated to earn with AI but don’t want to handle technical setup

AI news and solutions

STORM: An AI-Powered Writing System for the Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking

STORM: An AI-Powered Writing System for the Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking Generating comprehensive and detailed outlines for long-form articles, such as those on Wikipedia, poses a significant challenge. Traditional approaches…

AI Tech News
Kimi-Researcher: Revolutionizing AI with End-to-End Reinforcement Learning for Complex Reasoning

Understanding the Target Audience The announcement of Kimi-Researcher is particularly relevant for business leaders, AI researchers, technology strategists, and decision-makers in various industries. These individuals are eager to grasp the capabilities and applications of advanced AI…

AI Tech News
Towards GPT-5: what’s the current situation?

OpenAI CEO Sam Altman discussed the development of their next-generation AI model, GPT-5, at a recent conference. He highlighted the challenges in AI development and the progression of OpenAI’s models. GPT-4 Turbo and the “GPTs” function…

AI Tech News
The Creative, Occasionally Messy World of Textual Data

This article discusses the emergence of large language models in the field of natural language processing (NLP) and the innovative ways in which they are being used. It highlights various applications such as text-to-image and text-to-speech,…

AI Tech News
InternLM Research Group Releases InternLM2-Math-Plus: A Series of Math-Focused LLMs in Sizes 1.8B, 7B, 20B, and 8x22B with Enhanced Chain-of-Thought, Code Interpretation, and LEAN 4 Reasoning

The InternLM2-Math-Plus: Advancing Mathematical Reasoning with Enhanced LLMs Introduction The InternLM research team focuses on developing large language models (LLMs) tailored for mathematical reasoning and problem-solving. These models aim to enhance artificial intelligence’s capabilities in handling…

AI Tech News
Single Agent Architectures (SSAs) and Multi-Agent Architectures (MAAs): Achieving Complex Goals, Including Enhanced Reasoning, Planning, and Tool Execution Capabilities

AI Tech News
OpenAI builds new “Preparedness” team to handle AI’s existential risks

OpenAI has established a team called “Preparedness” to address the potential risks associated with AI. The team will evaluate current and future AI models for risks such as tailored persuasion, cybersecurity threats, autonomous replication, and even…

AI Tech News
Efficient Blockchain State Management with Quick Merkle Database (QMDB)

Challenges in Blockchain State Management Blockchain systems struggle with managing and updating state storage efficiently. This is due to high write amplification and extensive input/output operations. Traditional methods like Merkle Patricia Tries (MPT) cause frequent and…

AI Tech News
This AI Paper from Meta AI and MIT Introduces In-Context Risk Minimization (ICRM): A Machine Learning Framework to Address Domain Generalization as Next-Token Prediction.

The study discusses the challenges in AI systems’ adaptation to diverse environments and the proposed In-Context Risk Minimization (ICRM) algorithm for better domain generalization. ICRM focuses on context-unlabeled examples to improve out-of-distribution performance and emphasizes the…

AI Tech News
This AI Paper from China Introduce InternLM-XComposer2: A Cutting-Edge Vision-Language Model Excelling in Free-Form Text-Image Composition and Comprehension

The development of AI has significantly advanced the integration of text and imagery, posing challenges in creating cohesive multi-modal outputs. Existing approaches struggle to balance language understanding and visual elements. Researchers from Shanghai AI Lab, Chinese…

AI Tech News
Microsoft Researchers Introduce PromptBench: A Pytorch-based Python Package for Evaluation of Large Language Models (LLMs)

The need for standardization in large language models (LLMs) presents a challenge for effective model comparisons and evaluation. PromptBench emerges as a novel solution, offering a modular evaluation framework that simplifies task specification and dataset loading.…

AI Tech News
This AI Paper Proposes ‘GREAT PLEA’ Ethical Framework: A Military-Inspired Approach for Responsible AI in Healthcare

Research from various institutions proposes the GREAT PLEA ethical framework for generative AI in healthcare, mirroring military ethics, to ensure transparency, fairness, and empathy in AI deployment, and calls for user education on AI systems to…

AI Tech News
OpenAI Fires CEO Sam Altman and Co-Founder Greg Brockman

OpenAI has removed Sam Altman as its CEO due to communication transparency issues. Mira Murati, the former CTO, will serve as interim CEO. Greg Brockman, the president and co-founder, has also resigned. OpenAI’s success with ChatGPT…

AI Tech News
Run AI Coding Agents in Parallel with Dagger’s Container-Use: A Developer’s Guide

Understanding the Target Audience The concept of running multiple AI coding agents in parallel using container-use from Dagger is particularly relevant for developers, team leads, and project managers within tech organizations. These professionals are typically engaged…

AI Tech News
Aya Vision: Revolutionizing Multilingual AI Communication

Cohere For AI Launches Aya Vision: A New Era in Multilingual and Multimodal Communication Cohere For AI has introduced Aya Vision, an innovative open-weights vision model designed to enhance multilingual and multimodal communication. This advancement aims…

AI Tech News
Top Local LLMs for Coding in 2025: A Developer’s Guide

Local large language models (LLMs) have seen a remarkable rise in capability, specifically in the realm of coding. By mid-2025, developers now have access to advanced tools that allow for code generation and assistance entirely offline.…

AI Tech News
ggml: A Machine learning (ML) Library Written in C and C++ with a Focus on Transformer Inference

Practical Solutions for Running Large Language Models on Commodity Hardware Deploying advanced machine learning models on resource-constrained devices like edge devices, mobile platforms, or low-power hardware has been challenging due to the computational and memory resources…

AI Tech News
Prompt Engineering Tips, a Neural Network How-To, and Other Recent Must-Reads

Here are ten recent standout articles from Towards Data Science – Medium: 1. “New ChatGPT Prompt Engineering Technique: Program Simulation” by Giuseppe Scalamogna explains a prompt-engineering technique that simulates a program to improve the performance of…

AI Tech News
How to Make Money with AI Tools

AI-Powered Micro-Business: A Lean Canvas Business Plan This plan outlines how small business owners and online creators in the U.S. can leverage AI tools, specifically the AI Business Accelerator (itinai.com), to generate revenue with minimal technical…

AI Business
7 GPTs That Are Game-Changing For Entrepreneurs

AI Tech News