Data Mapping as a Search Problem

Data mapping is a critical process in data management, enabling the integration and transformation of data from various sources into a unified format. This approach provides a novel and effective way to automate the discovery of mappings between structured data sources.

Foundational Concepts

Data Mapping: Matching fields from one database to another, transforming data from a source schema to a target schema.

Search Problem: Finding an optimal path from the source schema to the target schema through a space of possible transformations.

Viewing Data Mapping as a Search Problem

Data mapping is seen as a search problem in the TUPELO system, involving critical instances of source and target schemas, exploring the transformation space, and intelligently reducing the number of states visited during the search process.

Challenges in Data Mapping

Complex Semantic Mappings: Many data mappings involve complex transformations beyond schema matching, including handling semantic differences and structural transformations.

Search Heuristics: Developing effective search heuristics to guide the exploration of the transformation space is challenging.

Scalability: Ensuring the mapping system can handle large-scale data with multiple relations and attributes is a significant challenge.


The TUPELO system implements innovative techniques such as example-driven generation, search algorithms like IDA and RBFS, and cosine similarity to address the challenges in data mapping.

Future Developments

The approach to data mapping as a search problem opens avenues for future research and development, including enhanced search heuristics, broadening applicability, and machine learning integration.


Data mapping as a search problem provides an effective approach to automating the discovery of mappings between structured data sources. Leveraging search algorithms, example-driven generation, and advanced heuristics, systems like TUPELO can significantly improve the accuracy and efficiency of data integration processes.

