Data Collection

Empyr organizes data into modular collections that represent verifiable, structured knowledge extracted from real web interactions. Each collection is standardized through a public schema, linked to a content-addressable store, and governed by token-weighted curation and verification.

Data Collection Types

1. Interaction Data Captures safe, anonymized behavioral patterns that help agents learn how users navigate and act on the web.

  • Click and scroll heatmaps

  • Page dwell time metrics

  • Hover, focus, and intent signals

  • Interaction chains (action → outcome traces)

2. Structural Data Provides machine-readable web layouts that enable agents to interpret complex or dynamic pages.

  • DOM and CSS structure maps

  • Element labeling and hierarchy graphs

  • Accessibility metadata

  • Script and dependency inventories

3. Functional Data Describes web forms, tool calls, and dynamic workflows that browser agents must understand to execute actions.

  • Form schemas and validation rules

  • API call traces from approved extensions

  • Function parameter mapping

  • Tool invocation sequences and success metrics

4. Content and Product Data Normalizes structured page content for agents working in commerce, research, or discovery.

  • Product and service listings with attribute maps

  • Pricing and stock updates

  • Embedded schema.org and microdata extraction

  • Summarized text and structured snippets

5. Contextual Data Provides the situational metadata needed to evaluate reliability and freshness.

  • Timestamped source provenance

  • Domain reputation scoring

  • Temporal decay and refresh rate tracking

  • Geographic and categorical tagging

6. Synthetic and Enriched Data Combines verified original data with synthetic augmentation for model training and benchmarking.

  • AI-generated fill-ins for sparse fields

  • Semantic clustering outputs

  • Ground truth alignment with external datasets

  • Benchmark sets for tool and agent evaluation

Schema Registry

Each collection is defined through a machine-readable schema registered on chain.

  • Schemas define allowed fields, datatypes, privacy class, and license.

  • Changes require DAO approval through governance.

  • Curators and verifiers use the schema to automate validation and reward distribution.

  • Buyers use schemas for seamless integration with their models or pipelines.

Access and Monetization

Empyr supports multiple access modes and pricing models to serve diverse users.

  • Live Streams: Continuous updates for training or real-time inference.

  • Snapshot Packages: Fixed datasets for offline training or auditing.

  • Query-on-Demand: Pay-per-query API with caching and pagination.

Access is purchased through data credits, which are converted from tokens and burned when used. Pricing per dataset and access mode is set dynamically by governance based on demand, cost, and verification complexity.

Quality Assurance

To maintain reliability across collections

  • Producers sign submissions with key-based attestations.

  • Curators run automated linting, deduplication, and semantic checks.

  • Verifiers conduct random challenge audits backed by stake.

  • Failed audits trigger stake slashing and public dispute records.

A rolling leaderboard and reputation score incentivize long-term honest behavior.

Example Collections

Collection Name
Description
Update Interval
Access Type
Governance Status

FormSchema-v1

Browser form structures, field names, input types

Daily

Snapshot, Stream

Active

ToolTrace-v1

Function and API call sequences from browser extensions

Realtime

Stream

Beta

PageMap-v1

DOM and layout extraction for major commerce and news sites

Weekly

Snapshot

Active

ClickFlow-v1

Anonymized interaction chains with page identifiers

Daily

Stream

Draft

Licensing and Provenance

Each dataset object is embedded with

  • License fingerprint (CC0, commercial, or private)

  • Contributor signature

  • Hash pointer to raw source bundle

  • Timestamp and collection ID

License compliance and provenance trails are enforced at the protocol level, creating transparent traceability for buyers and auditors.

Last updated