PRISMPRISM
Benchmarks and scaling
Benchmark families, resource strategy, sharding, caching, and scaling behavior for validator workers.
prismbenchmarksscalinggpu
Sources
Benchmark types
PRISM should combine architecture-sensitive tests with training-variant tests.
- Heldout language or reasoning tasks.
- Long-context tasks for context claims.
- Stability runs across seeds or shards.
- Efficiency checks for memory, latency, active parameters, and GPU hours.
- Regression tasks that prevent overfitting to one benchmark family.
Scaling controls
Validators need predictable resource bounds so one submission cannot starve the queue.
| Control | Purpose |
|---|---|
| GPU class labels | Route jobs to compatible worker pools. |
| Wall-clock timeout | Stop runaway builds or benchmarks. |
| Artifact cache | Reuse verified artifacts by checksum. |
| Shard scheduler | Parallelize tasks without revealing hidden-set integrity. |