This article titled “From Freebase to Wikidata: The Great Migration” covers the transition of data from Freebase to Wikidata and the complexities associated with such a migration. Here’s a detailed breakdown of the main concepts presented in the document:
Introduction and Background
Freebase, an open and collaborative knowledge base, was launched in 2007 by Metaweb and later acquired by Google in 2010. It became part of Google’s Knowledge Graph, serving as a structured source of knowledge for search and other data-driven applications.

However, due to the growing success of Wikidata—a similar project developed by Wikimedia in 2012—Google decided in 2014 to shift its focus away from Freebase and help migrate its content to Wikidata.
Wikidata differs from Freebase in its approach and structure. While both Wikidata and other open, collaborative knowledge bases share similarities, Wikidata stands out because it encourages community-driven curation.

This means it can store conflicting information from different sources, which helps capture a range of perspectives. However, these differing data models and community practices created various challenges during the migration process.
Challenges of the Migration
The migration involved multiple technical and non-technical challenges, as Freebase and Wikidata operate under different paradigms:
- Licensing: Freebase’s data was published under a Creative Commons Attribution (CC BY 2.5) license, while Wikidata uses the more permissive CC0 license. This discrepancy meant that Google had to filter out certain content from Freebase that couldn’t legally be transferred to Wikidata.
- Data Quality: Wikidata has a high standard for data accuracy. Freebase, while vast, had issues with data quality. The migration needed human curation to ensure data accuracy, rather than fully automatic transfer.
- References: Wikidata requires sources for claims, a practice less rigorously followed in Freebase. This difference necessitated sourcing reliable references for Freebase data, often challenging because Freebase did not uniformly track sources.
- Community and Maintenance: As the migration added significant data to Wikidata, the need for long-term maintenance increased. Wikidata’s community was now responsible for curating and updating the imported data.

Data Mapping Process
Data mapping between Freebase and Wikidata was one of the most intricate aspects due to structural and semantic differences between the two databases:
- Data Topic Mappings: Mapping topics between Freebase and Wikidata involved identifying equivalent items. Initially, Google and Samsung provided mappings based on Wikipedia links, given both knowledge bases had Wikipedia-derived content.
- Data Property Mapping: The mapping of properties required manual intervention from the Wikidata community. Many Freebase properties didn’t have direct equivalents in Wikidata, necessitating custom mappings for attributes such as parentage, geographic coordinates, and dates.
Primary Sources Tool
To assist with the migration, Google developed the Primary Sources Tool. This tool facilitated crowdsourced human curation of Freebase data by allowing Wikidata contributors to approve or reject statements easily. Features included:
- Backend: The tool’s backend supported a REST API for handling requests and managing data, enabling efficient storage and retrieval of statements.
- Frontend: Integrated with Wikidata’s interface, the frontend allowed users to seamlessly approve or reject statements without disrupting their editing flow.
- Data Rollout and Flexibility: Data was rolled out progressively to avoid overwhelming the community. The tool was also designed to support other datasets, not just Freebase, providing flexibility for future migrations.

Statistics on the Migration
Several statistics highlighted the scale and impact of the migration:
- Quantitative Comparison: The last Freebase dump contained around 48 million topics and nearly 3 billion facts. However, only 4.56 million items could be successfully mapped to Wikidata due to differences in relevance and structure, resulting in around 14 million new Wikidata statements.
- Spatio-Temporal Comparison: A comparison of coverage between Freebase and Wikidata revealed similar geographic and historical distributions, though Freebase had stronger coverage of recent data, particularly for entities like music records and population statistics.
- Raw Statistics: While Freebase had a higher volume of data, much of it didn’t meet Wikidata’s notability criteria, which prioritizes significant and verifiable information.
Future Implications of the Migration
The migration of Freebase data to Wikidata showcases the power of collaborative knowledge bases to organize and share structured information. Thanks to this successful migration, Wikidata has grown significantly, both in the amount of data it contains and in the variety of information accessible to users worldwide.
This development opens the door to more innovation and highlights important considerations for future knowledge management projects.
Setting Standards for Open Knowledge
The transition illustrates a model for open data migrations, setting a standard for how to manage intellectual property, data accuracy, and community involvement in similar projects.
The process, especially in developing tools like the Primary Sources Tool, demonstrates how data providers can empower communities to contribute meaningfully while respecting the complexities of the data.
Interoperability Across Platforms
This migration highlights the importance of interoperability among knowledge bases. By creating connections between Freebase and Wikidata, the project has established a pathway for data exchange across various platforms.
This promotes compatibility and collaboration between different systems. In the future, other knowledge bases may adopt this model to enable smoother integration of information, thereby supporting linked open data ecosystems.
Expanding Crowdsourced Knowledge Curation
Through the use of crowdsourced curation tools like the Primary Sources Tool, Wikidata has proven the value of community-driven data maintenance. Wikidata’s open, community-curated model encourages a decentralized, yet organized approach to updating and verifying information.
This approach could be applied to other open data projects, particularly as the amount of data increases and more human oversight becomes necessary to ensure quality.
Enhanced Machine Learning and AI Applications
As an enriched knowledge base, Wikidata is positioned to support more advanced AI applications. With structured data that includes both human-curated information and crowd-sourced knowledge verification, AI models can draw from a larger, more reliable dataset.
Leveraging Knowledge for Education and Research
The migration enhances Wikidata’s utility as a free educational resource, making structured data available for a range of academic and research applications. As a vast and accessible repository, it is useful for students, researchers, and educators seeking verifiable information. Additionally, the increase in references and verified data improves its value as a credible resource, likely encouraging more institutions to adopt Wikidata for educational purposes.
Challenges to Address in Future Migrations
While the Freebase to Wikidata migration has been largely successful, it has also illuminated ongoing challenges that future migrations may need to address:
Data Quality and Consistency
Ensuring data quality remains a significant challenge. Although crowdsourced verification is effective, it relies on the active engagement of contributors and can vary in thoroughness.
Tools that support data validation, automated error detection, and enhanced reporting functionalities could improve consistency and reduce the burden on individual contributors.
Balancing Data Volume and Community Capacity
The migration increased the amount of data within Wikidata significantly, raising questions about the community’s capacity to maintain and curate such a vast datase
Future migrations will need to consider the sustainable growth of data in relation to the size and activity of the user community, potentially balancing this with automated tools that can help streamline routine curation tasks.
Licensing and Intellectual Property
Licensing challenges are prevalent in data migrations, especially when transferring content with varied copyright and licensing agreements.
Clearer licensing structures and modular data filtering tools can aid future projects in efficiently handling licensing constraints and ensuring compliance with legal requirements.
Cross-Cultural Data Interpretation
Given that Wikidata supports data from around the world, it must accommodate differing cultural perspectives, especially in areas like historical interpretations and geopolitical data.
For future migrations, incorporating mechanisms that allow for multiple viewpoints, while flagging controversial or conflicting data, will be important for maintaining Wikidata’s integrity as an unbiased repository.
Technical Integration and Compatibility
Freebase and Wikidata used different data models, making the technical integration challenging. Future migrations will benefit from standardized data structures or protocols that ease compatibility, reducing the need for extensive mapping and restructuring during transfers. This could entail developing common ontologies or interoperability standards for collaborative knowledge bases.
Conclusion
The migration of data from Freebase to Wikidata is a landmark event that exemplifies the power and potential of open, collaborative knowledge bases. By moving a large volume of structured data into Wikidata, Google has strengthened Wikidata’s role as a central knowledge repository, while also supporting the ongoing development of a community-driven data model. This migration provides a useful case study for future open-data projects, illustrating the need for thoughtful planning, community involvement, and technical innovation.
The Freebase to Wikidata migration has implications that extend beyond the immediate scope of the two platforms involved. It contributes to the larger vision of a linked and interoperable web of data that is openly accessible and collectively maintained.
As this vision evolves, projects like this one will serve as blueprints, offering valuable insights into the challenges and rewards of collaborative knowledge curation.
Through continued innovation and community collaboration, Wikidata, and similar platforms can continue to grow as pillars of open knowledge, serving the public good in an increasingly data-driven world.
From Freebase to Wikidata: The Great Migration
Authors: Thomas Pellissier Tanon, Denny Vrandečić, Sebastian Schaffert, Thomas Steiner, Lydia Pintscher
Published in: Proceedings of the 25th International Conference on World Wide Web (WWW 2016)
Publication Date: April 2016
DOI: 10.1145/2872427.2874809