The Great Library of Source Code

Software Heritage is an archival project for source code and its development history. Its long-term mission is to collect, preserve, and share our entire Software Commons, that is the body of knowledge expressed as publicly available software source code.

Software Heritage archive both source code and the associated development history, as it is captured by modern version control system. The data model is a Merkle DAG, where all source code artifacts—file contents, directories, commits, etc.—are thoroughly deduplicated, reducing storage requirements.

The Software Heritage archive is already the largest of its kind, having archive more than 5 billion unique source code files and more than 1 billion unique commits coming from more than 80 million software projects. The archive crawls periodically forges like GitHub and GitLab.com, distributions like Debian, and package managers like PyPI. The archive is accessible via a Web UI as well as a Web API.

The archive serves various different use cases, ranging from preservation of our cultural heritage for posterity to scientific research on "big code" analysis, from business needs of tracking software provenance to educational purposes in computer science curricula.

Software Heritage is a non-profit endeavor committed to transparency; all the source code developed for the needs of the project itself is free software available from the project forge.

lightbulb_outline View ideas list

Technologies

  • python
  • javascript
  • postgres
  • django
  • git

Topics

  • Data and Databases
  • digital preservation
  • source code archive
  • free and open source software
  • big data
  • big code
comment IRC Channel
email Mailing list
mail_outline Contact email

Software Heritage 2019 Projects

  • Thibault Allançon
    Graph compression on the development history of software
    Software Heritage is an ambitious research project whose goal is to collect, preserve in the very long term, and share the whole publicly accessible...
  • Archit Agrawal
    Increase archive coverage
    Increase archive coverage As Software Heritage works on archiving and sharing source code, one of the major tasks is to ingest the latest source code...
  • Kalpit Kothari
    Software Heritage - Web UI Improvements - kalpitk
    Improve the Web UI of the archive Software Heritage can be accessed through a beautiful and rich Web UI, developed in Django. Since the web portal is...
close

2019