The Verifier Alliance dataset now has 8M+ verified contracts! @ethereum has a strong verification culture, but data is siloed and hard to access. We’ve opened it all up in Parquet format for researchers, analysts, and AI engineers. Explore the schema & download the dataset👇
@SourcifyEth @blockscout @routescan_io What is this dataset good for? - Compiler testing - Identifying vulnerability patterns in contracts - Training models - Data analysis of EVM contracts ...many other use cases that weren't possible without an open dataset Brought together by @blockscout @routescan_io @SourcifyEth
How does the data look like? VerA is a PostgreSQL DB with each verification being a coupling between a "deployment" and a "compilation". The bytecodes and sources are deduplicated in separate tables. See the schema:
EVM bytecode is unstructured. Therefore "transformations" mark the changes to data values needed to reach an onchain bytecode from a recompiled bytecode. This includes immutables, libraries, cborAuxdata, constructorArguments positions and values
The whole DB is exported daily in Parquet, a modern column-based data format that is directly queryable. Head over to the docs and see how to download:
81K