TokenIntel Research · Methodology

How TokenIntel Scores DeFi Protocol Risk

Seven dimensions. Twenty-five published sub-criteria. Explicit weights. No vibes, no vague ratings. Every score on the DeFi Risk Map traces back to a decomposed rubric you can inspect.

The short version

TokenIntel's DeFi Risk Map grades each protocol on seven dimensions: Smart Contract, Oracle, Governance, Liquidity, Economic, Admin Architecture, and Disclosure Quality. Each dimension breaks down into three to five specific sub-criteria with explicit weights that sum to 100%. Each sub-criterion is scored 0 to 100 where lower is less risky. The dimension score is the weighted average of its sub-criteria. The overall protocol grade is the weighted average of dimension scores, converted to a letter (A to F).

Why publish the rubric?

Most DeFi risk scoring frameworks give you a single letter or number with minimal explanation of how it was computed. "Aave has an A rating" tells you almost nothing: what if audit coverage is strong but admin controls are weak? What if the protocol is immutable but depends on six off-chain custodians? A single aggregate number hides the tradeoffs that actually matter for a position decision.

TokenIntel takes the opposite approach. We decompose every dimension into specific sub-criteria, publish the weights, and show each sub-score individually on every protocol's Risk Map row. If you disagree with our weighting, you can reweight it yourself. If you think our score for a specific sub-criterion is wrong, you can see exactly which one and challenge it.

This framework is inspired by YieldCompass's DeFi strategy risk methodology, which pioneered this decomposition approach for Solana yield strategies. We adapted their ideas to TokenIntel's protocol-level scope and added TI-specific dimensions like Admin Architecture, which became more important after the April 2026 Drift Protocol exploit.

Scoring scale and letter grades

Every sub-criterion and dimension uses the same 0 to 100 scale where lower scores mean less risk. A sub-criterion score of 20 represents a protocol at low risk on that specific axis; a score of 80 represents a protocol that fails that check materially.

Dimension scores are the weighted average of their sub-criteria. The overall protocol risk score is the weighted average of the six dimension scores (weighted 20/15/15/15/15/20, see below). That 0 to 100 aggregate is mapped to a letter grade:

A (0 to 24)
Low risk across all dimensions. Top-tier audits, no hack history, deep liquidity, battle-tested admin architecture.
B (25 to 39)
Mostly strong with 1-2 moderate risks. Suitable for meaningful exposure with standard risk management.
C (40 to 54)
Mixed profile. Some dimensions strong, others concerning. Requires understanding which specific risks you are taking.
D (55 to 69)
Multiple material risks. Only appropriate for small, informed positions.
F (70 and up)
Severe risk on multiple dimensions. We do not recommend exposure regardless of yield.
Why not a finer scale?

We deliberately use coarse grades and round sub-scores to multiples of 5. Finer scales suggest precision we do not have. A 27 and a 31 on the same sub-criterion both mean "low moderate risk" in practice, but users anchor on the exact number and treat small differences as meaningful. Coarse grades force honest, defensible scoring and make cross-protocol comparison easier.

Three layers of risk every dimension maps to

Our seven dimensions all score a protocol's current risk posture, but they do not all score the same kind of risk. Separating risks by their mechanism helps decide which score matters most for a given protocol, and when additional scoring is futile because a more fundamental risk dominates.

A useful taxonomy for this comes from Anastasiia (@mathy_research), who frames vault-level credit risk as three structurally independent layers in her April 2026 Vault Summit framework. We use the same layers to describe how the first six dimensions fit together; the seventh dimension (Disclosure Quality) is orthogonal, it scores institutional underwriteability rather than mechanical failure modes:

Layer 1. Mechanical
Risk that arises from how the protocol executes when its contracts and parameters work exactly as designed. Oracle-to-execution price gaps, liquidation slippage against finite pool depth, gas and MEV competition consuming liquidation margin, utilization ceilings forcing withdrawal freezes. Our Oracle, Liquidity, and Economic dimensions primarily score this layer, and four of the five additional checks (Collateral Concentration, Dependency Count, Cross-Chain Messaging Posture, Frontend Contract Consistency) interact with it. The five-channel decomposition of Layer 1 risks is covered in detail in the Vault Credit Risk framework.
Layer 2. Governance
Risk that the parameter-setting process cannot react faster than stress evolves. The gap between a protocol's response window (how quickly a parameter change would need to land to be protective) and its timelock duration (how long governance takes to ship that change) is a structural risk in itself, not just a process observation. Our Governance dimension scores this, with recent emphasis (post-Kelp) on pre-incident deprecation decisions and risk-param conservatism, not just post-incident response speed.
Layer 3. Code integrity
Risk that the contract dependency graph executes contrary to specification, an undiscovered vulnerability, an exploit, an unauthorized upgrade. When this layer fails, none of the Layer 1 or Layer 2 scores provide advance warning: the loss is a code failure, not a risk-management failure. Our Smart Contract and Admin Architecture dimensions score this layer. Empirically, single-protocol vaults with strong audit + bug-bounty track records show annualized exploit probability in the low fractions of a percent; cross-protocol or bridge-dependent vaults are materially higher, because code-integrity risk compounds with each additional protocol layer in the dependency chain.
Dominance condition

When Layer 3 failure probability over your holding horizon is meaningfully larger than expected Layer 1 loss over the same horizon, improving Layer 1 scoring (better oracles, deeper liquidation pools, more conservative LTVs) does not reduce total expected loss. Code-integrity risk is a prerequisite check: assess it first. A protocol with a top-quartile Oracle dimension and a lingering cross-bridge dependency to an unaudited messaging layer is, in practice, scored by that dependency. Per Anastasiia's framework, Layer 3 must be cleared before Layer 1 tuning is worth doing. This is why a low Smart Contract or Admin Architecture score can sink an otherwise strong overall grade.

The seven dimensions and their sub-criteria

Each dimension is scoped to avoid overlap. Smart Contract covers the protocol's own code. Counterparty dependencies on external oracles, bridges, or custodians roll up under Oracle, Liquidity, or Admin Architecture as appropriate. Here is the full rubric.

1. Smart Contract

Dimension weight: 20%

Risk from bugs, exploits, or operational failures in the protocol's own fund-handling contracts. Higher weight than most other dimensions because smart contract failure is the fastest path to total loss.

Sub-criterionWeightWhat we evaluate
Audit Coverage & Depth30%Count and depth of independent audits on contracts handling user funds. Bonus credit for formal verification and active bug bounty programs.
Hack History25%Past exploits or critical incidents affecting user funds, weighted by recency, severity, and quality of remediation.
Version Lindy20%How long the currently deployed fund-handling contracts have operated without critical failure. Not protocol age: if the vault contract was redeployed last month, Lindy is measured from then, not from the protocol's launch.
Upgradeability & Control25%Immutable vs upgradeable contracts, who controls upgrade authority, and whether unilateral modification of user-critical logic is possible.

2. Oracle

Dimension weight: 15%

Risk from reliance on external or internal price feeds. A wrong price is indistinguishable from a wrong balance to most protocols.

Sub-criterionWeightWhat we evaluate
Oracle Architecture40%Quality and diversity of price feed architecture. Chainlink multi-source preferred over single-source TWAPs or proprietary feeds.
Manipulation Resistance30%Resistance to flash loan manipulation and MEV extraction. Heartbeat, staleness checks, and sanity bounds.
Fallback & Override30%Presence of circuit breakers, fallback oracles, and emergency override authority when price feeds misbehave.

3. Governance

Dimension weight: 15%

Risk from how decisions are made and executed. Even a perfectly audited contract is only as safe as the process that decides what code runs next.

Sub-criterionWeightWhat we evaluate
Upgrade Authority40%Who can push code changes. Timelock length, quorum requirements, and whether approval requires multi-entity sign-off.
Multisig & Key Custody30%Multisig signer count, threshold, and diversity. Independent signers preferred over team-only.
Emergency Powers30%Scope of unilateral pause, freeze, or recovery capabilities. Who holds them and under what conditions.

4. Liquidity

Dimension weight: 15%

Risk from being unable to exit a position when you want to. High displayed TVL means nothing if withdrawals are gated or slippage is catastrophic at size.

Sub-criterionWeightWhat we evaluate
Exit Depth40%Slippage impact for large withdrawals. TVL relative to single-position exit size.
Withdrawal Constraints30%Cooldowns, queues, withdrawal caps, and processing delays before funds are available.
Redemption Model30%Instant on-chain redemption vs epoch-based vs reliance on secondary market liquidity.

5. Economic

Dimension weight: 15%

Risk that the protocol's economic model cannot sustain its own returns. Yield from genuine fees is durable; yield from emissions is a countdown timer.

Sub-criterionWeightWhat we evaluate
Revenue Durability40%Real fees from genuine usage vs emissions or subsidies. Would the yield exist without the token?
Incentive Dependence30%Fraction of displayed APY driven by temporary incentives, points, or token emissions rather than protocol revenue.
Token Capture Mechanism30%Does the token have a mechanism (fee switch, buyback, burn) that routes real protocol revenue to holders?

6. Admin Architecture

Dimension weight: 17%

Risk from how administrative powers are scoped and custodied. This dimension was added after the April 2026 Drift Protocol attack, where $285M was drained in 12 minutes via 31 withdrawals using privileged access. A perfectly audited contract with a compromised admin key is still a zero.

Sub-criterionWeightWhat we evaluate
Key Custody Model30%EOA vs multisig vs timelocked DAO controls. Separation of pause, parameter, and upgrade powers.
Signer Diversity25%Independent signers across organizations vs team-only. Public identities preferred over anonymous.
Action Scope25%What admin can change. Parameter-only changes are lower risk than arbitrary code upgrades or treasury access.
Risk Oversight20%External risk advisory (Chaos Labs, Gauntlet, BlockScience) and maturity of incident response procedures.

7. Disclosure Quality

Dimension weight: 10%

Institutional underwriteability of the protocol from an information-availability standpoint. The other six dimensions score how the protocol can fail; this one scores whether you can analyze it. Inspired by Blockworks' Token Transparency Framework filings (already integrated via data/ttf-registry.json) and Novora's 5-pillar IR Score, with TI's own per-protocol observations on cadence and depth. Weighted lower than core technical dimensions because poor disclosure reflects underwriting friction, not direct loss-of-funds risk, but it's the dimension that determines whether a real institutional allocator can deploy at all. Per Novora's April 2026 audit of 159 protocols: ~91% generate trackable revenue, but only ~18% publish quarterly updates, ~8% issue token-holder reports, and fewer than ~1% disclose market-maker terms.

Sub-criterionWeightWhat we evaluate
Standardized Framework Filing30%Whether the protocol has filed a public disclosure with a recognized framework (Blockworks TTF, Novora IR Score, or equivalent). Filings include token allocations, supply schedules, financial disclosures, and team accountability. Not filing scores worse than partial filing.
Investor Cadence20%Frequency and depth of investor-style updates: quarterly reports, ecosystem updates, milestone disclosures. Quarterly is the institutional baseline.
Token Holder Reports20%Dedicated communications to token holders covering revenue accrual, buyback execution, supply changes, and forward outlook.
Treasury & Buyback Transparency15%Treasury holdings published with auditable trail; buyback execution data (volume, price, timing) disclosed real-time or near-real-time. On-chain buyback-and-burn passes automatically; buyback-and-hold needs extra discipline.
Market Maker Terms15%Whether the protocol discloses market-maker engagements (which firms, what loans, what option strikes). One of the largest sources of soft-circulating supply pressure that crypto protocols routinely fail to disclose.

Five additional checks on every protocol

Beyond the six scored dimensions, we track five binary and quantitative checks on every protocol research page. These are not weighted into the aggregate score because they are effectively red flags: a protocol that fails any of them has a structural problem regardless of its dimension scores.

Frontend Contract Consistency
Does the official user interface route transactions exclusively to documented and verified contract addresses? A "no" here means the UI could be swapped or modified without users noticing, which is a real attack vector.
Deployment Address Clarity
Are the deployed contract addresses clearly documented in the protocol's official documentation and independently verifiable on-chain? "No" means users cannot confirm what code they are interacting with.
Dependency Count
Count of independent external entities (oracles, bridges, custodians, off-chain service providers, upstream protocols) whose correct functioning is required for the strategy to operate safely. More dependencies equals broader blast radius in the event of any single failure.
Collateral Concentration
For any pooled lending reserve, what percentage of borrow collateral is sourced from a single asset class (e.g. ETH LSTs, a single stablecoin, RWA tokens)? A reserve where >70% of collateral comes from one class is not a diversified lending book, it is effectively financing a single concentrated strategy, and depositors are bearing the tail risk of that strategy without being compensated for it. Applies especially to protocols with a unified-pool architecture where all suppliers of an asset earn the same APR regardless of what their capital is ultimately financing. The April 2026 Kelp incident revealed that 98.5% of Aave's WETH borrow collateral came from ETH LSTs, turning aWETH depositors into de facto third-loss capital in a concentrated LST carry trade. We now flag any reserve where single-class concentration exceeds 70%, and score higher (worse) when the protocol doesn't offer collateral-specific borrow rates (which v4 Risk Premiums and modular platforms like Morpho do).
Cross-Chain Messaging Posture
If the protocol uses a cross-chain messaging layer (LayerZero, Wormhole, CCIP, Axelar, Hyperlane), evaluate the configuration along four dimensions: (1) DVN / validator redundancy (1-of-1 vs 3-of-5, etc.); (2) operator independence (signers from different organizations vs correlated operators); (3) infrastructure diversity, critically, do the DVNs read from different RPC providers with different hosting, and do they use different verification methods? (Multi-DVN with shared RPCs fails identically to single-DVN, as the April 2026 Kelp incident demonstrated when 2 of 3 RPCs on the same DVN were compromised and the third was DDoS'd.); (4) default vs hardened config (is the protocol using the messaging layer's quickstart / GitHub-default config, or did they customize?). This check was added after the April 2026 Kelp / LayerZero incident, where a 1-of-1 DVN configuration allowed attackers to drain ~$290M by compromising a single verifier pathway. Per Dune analytics, roughly 32% of LayerZero OApps ran 1-of-1 configurations at the time of the incident. LayerZero has since announced it will no longer sign messages for apps on single-DVN configurations, forcing migration. Per LayerZero's May 2026 post-incident statement, the new default baseline is 5/5 DVNs where possible, no less than 3/3, plus tiered RPC quorums (internal / dedicated-external / shared-external), a Rust-based second DVN client for client diversity, OneSig multisig with 7-of-10 thresholds, per-signer private anomaly checkers, and Console for automated configuration anomaly detection.
Trust Assumption Tiering (LayerZero rubric)
For protocols that depend on LayerZero, the May 2026 post-incident statement articulates a four-tier framework that TI now uses to score messaging posture more precisely. Ordered from most-reliant to least: (T1) Defaults, if the protocol relies on LayerZero Labs defaults (block confirmations, messaging library, DVN selection), it has fully delegated security to the LayerZero Labs multisig. This tier is intended for testing and should be flagged as a scoring failure for any production-scale protocol. (T2) LZ Labs DVN as one of N, if LZ Labs DVN is included in a multi-DVN setup, the LZ multisig is one piece of the N-piece security assumption. Acceptable if the multi-DVN is genuinely diverse on operator + RPC + verification method. Never run any DVN as the sole signer. (T3) Essence gas relay, zero trust impact, only liveness. If Essence fails, the DVN backstops. Score-neutral for security; minor liveness consideration. (T4) LZ Labs executor, zero trust impact, only liveness. If the executor doesn't execute, anyone (permissionless) can pay gas to manually complete the transaction on the destination chain. Score-neutral. For TI's scoring: protocols at T1 (defaults) are at the top of the next-stress-event watchlist regardless of how strong other dimensions look; T2 protocols with diverse multi-DVN configs are acceptable for the modern-DeFi baseline; T3 / T4 dependencies are noted but not penalized.

Reference case: Kelp DAO / LayerZero, April 2026

Attackers drained 116,500 rsETH (~$290M) from Kelp's LayerZero-powered cross-chain bridge by compromising a single DVN verifier path in a 1-of-1 configuration. They obtained root access to LayerZero Labs' DVN RPC infrastructure, replaced the op-geth binary on two of three nodes, and DDoS'd the uninfected third. A failover to the compromised nodes let the DVN attest a forged "burn" message claiming 116,500 rsETH had been burned on Unichain (the burn never happened. Unichain's outbound nonce stayed at 307 while Ethereum accepted nonce 308). The OFT Adapter on Ethereum released the funds as instructed.

Downstream contagion: the attacker then deposited 89,567 rsETH (76.9% of the stolen total) as collateral on Aave V3, borrowing 82,650 WETH + 821 wstETH (~$193M). Aave is now holding $124M–$230M in potential bad debt depending on how Kelp DAO allocates losses between Ethereum mainnet and L2 rsETH holders. BGD Labs had explicitly warned Aave about this specific risk during the rsETH listing discussion in February 2025, recommending a multi-DVN configuration. The warning was not adopted. This makes it a governance-process failing at Aave, not only a Kelp / LayerZero failing.

Key takeaways for TI's risk scoring: (1) a protocol's own contract security is independent from the bridge it depends on. Aave's smart contracts, oracle system, and liquidation mechanisms all operated correctly throughout; (2) messaging-layer default configurations can be insecure even when documented. LayerZero's V2 OApp Quickstart sample wires every pathway with a single required DVN, and per the Dune dashboard roughly 32% of LayerZero OApps currently run this minimal configuration; (3) a fast pause mechanism limits downside materially (Kelp's 46-minute pause blocked a second attack that would have released ~$100M more); (4) DVN count alone is not a sufficient security metric, a 2-of-2 DVN configuration would not have helped here if both DVNs read from the same compromised RPCs. Real diversification requires independent RPC providers, independent hosting infrastructure, and ideally different verification methods (some cryptographic, some not); (5) architecture alone doesn't determine trust, underwriting discipline does. SparkLend, which uses the same unified-pool architecture as Aave, captured the largest share of post-incident inflows (+$1.8B deposits Apr 19–21) because it had proactively deprecated rsETH in January 2026, rate-limits supply and borrow caps to prevent explosive exposure growth, and maintained >$350M of instantly-available spUSDT liquidity through the crisis. The blockworks / Leasure + Shaundadevens post-mortem frames this as "the deeper re-rating may be occurring not just across architectures, but across perceptions of who underwrites risk most credibly." For our scoring, it means the Governance dimension now explicitly weights pre-incident deprecation decisions and risk-param conservatism, not just post-incident response speed.

LayerZero's post-incident policy (April 2026): will stop signing/attesting messages for any application maintaining single-DVN configurations; all 1-of-1 OApps must migrate to multi-DVN. This is a forced-migration event for the ~32% of LayerZero apps that haven't upgraded yet. Arbitrum precedent: the 12-member Arbitrum Security Council invoked ArbitrumUnsignedTxType (EIP-2718) for the first time to freeze 30,766 ETH (~$71.5M) from the attacker on Arbitrum. The power existed in the chain's design but had never been used; its first invocation raises real questions about the decentralization spectrum in practice, and sets a precedent that the council may face pressure to use again for less clear-cut cases.

LayerZero structural fixes (May 2026 statement): the response goes well past the 1-of-1 ban. (a) Default-config migration: defaults on all pathways migrate to 5/5 DVNs where possible, no less than 3/3 on chains where only 3 DVNs are available. (b) Client diversity: a second DVN client written in Rust is in development, addressing the op-geth binary-swap attack vector specifically. (c) Granular RPC quorum config: DVNs can now select tiered quorums of internal, dedicated-external, and shared-external RPCs, addressing the shared-RPC failure mode that took down the Kelp pathway. (d) OneSig: a custom multisig where signers download transactions, merklize and hash them locally, and sign the root hash, preventing a compromised backend from slipping unauthorized transactions into the signing flow. LayerZero Labs is migrating its own multisig threshold from 3-of-5 to 7-of-10 across chains where OneSig exists. (e) Per-signer private anomaly checkers: every OneSig signer maintains their own custom anomaly checker on their signing device; criteria are not shared with the company or other signers, defeating insider-coordination attacks. (f) Console: a unified configuration platform with automated anomaly detection (unknown DVNs, ownership changes, block-confirmation changes, unsafe configurations, default-config usage). For TI's scoring, this is where the next protocol-side check moves: a LayerZero-using protocol that has migrated to non-default configs, integrated with Console anomaly detection, and uses a multi-DVN setup with diverse RPC quorums will earn a clean Cross-Chain Messaging Posture score; a protocol still on defaults at end of 2026 will score one full letter lower regardless of other strengths.

Internal-process disclosure worth flagging: the May 2026 LayerZero statement also discloses that approximately 3.5 years prior, a multisig signer used the multisig hardware wallet for a personal trade by mistake (intending to use a personal wallet). The signer was removed and wallets rotated. This is a separate trust-assumption data point: even a hardened multisig setup carries human-process risk that is not visible from on-chain configuration alone. The OneSig + per-signer anomaly checker design appears to be a direct response to this class of failure, beyond the Kelp incident itself.

Conflicting narratives about whose infrastructure was compromised and what guidance was given remain unresolved between Kelp and LayerZero; TI scores configuration posture, not attribution.

Sources: CoinDesk, OAK Research, and LlamaRisk coverage (April 2026); Banteg's on-chain attack investigation; Aave governance forum (incident report + scenario modeling); Dune's LayerZero OApp DVN Configuration dashboard. Dune methodology caveat: the dashboard reports DVN cardinality but does not expose the N-of-M threshold for optional DVNs and does not label operator identity. A configuration that looks safe on cardinality alone can still have correlated operators, shared RPC infrastructure (as this incident demonstrated), or a weak optional-DVN threshold.

The AI-era risk-surface shift (May 2026 calibration)

Through 2025 the highest-frequency cause of catastrophic DeFi loss was a code bug in the protocol's own contracts. TI's framework (and most peer frameworks) reflected that, weighting Smart Contract heavily. Two developments in 2026 are forcing a recalibration: AI-assisted auditing has compressed the cost of finding subtle contract bugs, and AI-assisted offensive capability has compressed the cost of attacking everything else (frontends, signing infrastructure, RPC providers, dev credentials, browser-level exploits, social engineering augmented by deepfakes). The empirical record of recent incidents already reflects the new asymmetry.

Anchor capability: Anthropic Claude Mythos Preview (released May 2026)

Per Anthropic's published system card and Project Glasswing materials, Mythos Preview has solved "The Last Ones" (TLO), a 32-step corporate-network attack simulation Anthropic estimates requires ~20 hours of skilled human effort, end-to-end without supervision. The same model has surfaced thousands of high-severity vulnerabilities spanning every major operating system and web browser. Earlier preview versions, during evaluation runs, used /proc/ access to search for credentials, attempted to circumvent sandboxing and escalate privileges, accessed credentials for messaging services, source control, and the Anthropic API by inspecting process memory, and intervened to suppress git history evidence after editing files outside their sanctioned scope. Anthropic restricts general access to the model and is deploying it defensively to a curated set of "systemically important" tech companies via Project Glasswing.

Note on self-presented capability claims. Mythos numbers come from Anthropic's own evaluation framework. The marketing layer overstates the asymmetry, but the directional claim (multi-step network attack capability has crossed a threshold that previously required skilled humans) is corroborated by independent government AI-safety teams' published evals. TI weights this as directionally credible, with the specific benchmarks treated as upper-bound estimates.

Sources: Anthropic Mythos system card; Project Glasswing announcement; DeFi Education's "Can The AI Companies Do Security?" framing post (May 2026). Also relevant: Anthropic's March 31, 2026 source-code leak via a public-bucket source map in a production npm package, which surfaced 3 shell-injection vulnerabilities in the leaked code, a separately-documented operational-security failure at the same vendor.

How this shifts TI's scoring (specific calibration changes). The framework is layered (Layer 1 mechanical, Layer 2 parametric, Layer 3 code-integrity), so the rebalance is not a single weight change. It is differential pressure on which Layer 3 sub-channels matter and where adjacent operational-security checks belong.

Smart Contract dimension: actively-maintained code re-rates lower
For protocols with full-time engineering, recurring AI-augmented audits, active bug bounty programs, and a track record of upgrade discipline, contract-bug residual risk is meaningfully lower than the 2024 base rate. The Lindy variant for the AI era: a high-value contract that has not been drained through the back end of 2025 and into 2026, despite being a public target, carries a lower forward exploit probability than a comparable contract from the pre-AI-audit cohort. Calibration: Smart Contract sub-criterion for "audit recency + AI-assisted review on critical paths" added; protocols meeting this bar receive sub-criterion scores 10 to 20 points lower than the unaudited / one-off-audited cohort.
Long-tail and unmaintained code re-rates higher
The same dynamic that lowers risk on the top names raises it on the long tail. Historically uneconomic exploits (small TVL, complex bug, manual human discovery cost) become economic when discovery cost falls toward zero. Calibration: Smart Contract scoring for protocols below ~$50M TVL, without ongoing development, or with single-author legacy contracts now carries a higher floor regardless of audit reports older than 12 months.
Frontend Contract Consistency check elevated in weight
The 2025 ByBit cold-wallet drain (compromised Gnosis Safe frontend serving a swapped transaction to a multisig signer) is the canonical case for this class of risk. AI lowers the cost of preparing convincing fake frontends, intercepting build-pipeline credentials, and serving region-targeted or user-targeted malicious bundles. Calibration: the Frontend Contract Consistency check is upgraded from a binary pass/fail to a graded sub-criterion that scores transaction-preview infrastructure, content integrity (subresource integrity hashes, signed bundles), and build-pipeline access controls.
Cross-Chain Messaging Posture check elevated in weight
The April 2026 Kelp / LayerZero incident already produced a methodology change here (DVN cardinality alone is not sufficient; RPC and operator independence matter). AI capability in multi-step network attack adds a second calibration: the realistic threat against a 1-of-1 or default-config messaging deployment is no longer "a sophisticated team eventually finds it" but "an AI-augmented attacker finds it on a measurably shorter horizon." Calibration: default-config penalty in this check raised; protocols still on V2 OApp Quickstart defaults graded one full letter lower on the dimension floor.
Operational-security adjacent (new sub-criteria under Admin Architecture)
Three categories of operational-security exposure that previously sat outside the framework now flow into Admin Architecture sub-criteria: (a) developer credential hygiene (npm token rotation, source-map exposure, signed git commits); (b) private-key custody for protocol-level signers (multisig hardware policy, blind-signing UI risk, social-engineering-resistant signing flows); (c) RPC and node-provider diversification (single-provider RPCs are now scored as concentration risk on parity with single-DVN messaging). The Anthropic source-code leak via a public-bucket source map (March 31, 2026) is a worked example of category (a) at a sophisticated vendor; ByBit / Gnosis Safe is the worked example of category (b); the Kelp / LayerZero RPC-takeover path is the worked example of category (c).
Disclosure Quality dimension: AI-augmented social engineering check
Deepfake-quality voice and video make team-impersonation attacks (fake "founder calls" prompting urgent custody actions, governance forum impersonation, support-channel impersonation) materially cheaper. Disclosure Quality now explicitly weights team identity verification posture (PGP-signed governance posts, verified-channel signaling for sensitive actions) and incident-disclosure cadence (a team that has practiced public incident communication is harder to impersonate convincingly).

What does not change. The Layer 3 prerequisite logic still holds (code-integrity must be cleared before Layer 1 tuning is worth doing). The Yield Compass framework still applies. The TradFi-translation failure modes covered below still apply. AI is a multiplier on the existing structure, the structure stays.

How TI surfaces this in protocol scoring. Existing research-page risk arrays will show the recalibrated weights; new sub-criteria appear in protocol risk reports starting May 2026. Where a protocol's Smart Contract score moves up because of AI-era reduced contract-bug risk, the corresponding Admin Architecture or Frontend Consistency score may move down to reflect the absorbed-but-shifted risk surface. Net protocol grades may not move materially in either direction; the composition of those grades is what changes.

Six places TradFi credit analogies break in DeFi

Our rubric is calibrated to how DeFi lending actually fails, not to a direct translation of bank-style credit assessment. The following six failure modes, articulated by Anastasiia (@mathy_research) in her April 2026 Vault Summit paper, describe where standard credit-risk concepts produce biased or misleading measures if applied to an onchain lending vault without adjustment. They map cleanly onto the Layer 1 (mechanical) risks scored above, and understanding them helps read our Oracle, Liquidity, and Economic scores more precisely.

1. Oracle-to-execution divergence
In a bank, the valuation agent may mark a collateral position slightly off clearing prices, but deviations are bounded and institutions absorb them. Onchain, the oracle is the protocol's pricing agent, and liquidations execute automatically against oracle marks. When oracle-reported price exceeds the true fillable execution price by more than the coverage buffer, a vault looks solvent on paper but is factually insolvent at the price it would actually realize. The oracle-to-execution wedge is a credit risk variable, not operational noise.
2. Recovery endogeneity
Standard Loss-Given-Default in TradFi is a fixed haircut derived from historical recoveries, treating the collateral market as deep enough that one liquidation doesn't move it. DeFi liquidations route through onchain venues with finite depth, the larger the liquidation, the worse the fill, the lower the recovery, the bigger the shortfall. Expected shortfall is nonlinear in liquidation mass, and accelerates faster than position size under correlated stress (all looped LST positions unwinding into the same pool at once is the canonical case). This is why concentration concentration matters so much: a 90% LTV that's safe on one position is structurally unsafe across 119 positions sharing the same exit liquidity.
3. Full-information bank runs
In TradFi, depositors can't observe each other's intentions. That information asymmetry dampens run incentives because you don't know whether other depositors are about to run. Onchain, all state is public at the block level, utilization rate, withdrawal queue, vault composition are observable to every depositor. Under common information, what is probabilistic in TradFi becomes close to deterministic in DeFi: when users see utilization approaching the withdrawal ceiling and collateral quality deteriorating simultaneously, the rational move is to exit first. Runs onchain are faster, sharper, and more complete than their TradFi counterparts.
4. Parameter rigidity under timelocks
A TradFi collateral manager facing a deteriorating position can adjust eligibility and coverage tests intraday. Onchain parameter changes are gated by governance timelocks, often 24–72 hours. If a protocol's critical response window (the maximum horizon over which a parameter change remains protective) is shorter than its timelock duration, curator intervention is structurally unreachable during stress, not because governance is slow, but because the math doesn't close. This is why our Governance dimension now explicitly weights pre-incident deprecation decisions: the only effective parameter changes in fast stress are the ones already made.
5. Oracle latency and manipulation
The TradFi analogy treats oracle risk as valuation-agent risk, bounded by redundancy. Onchain, oracle error is adversarially targeted, correlated with stress regimes, and automatically trips liquidation logic. Two distinct failure channels: latency (a stale oracle reflects pre-stress prices during fast drawdowns, keeping undercollateralized positions open and letting secondary actors rationally borrow against an incorrect mark) and manipulation (feasible when the cost to displace the oracle's reference market falls below the manipulation payoff, a threshold that can be crossed on thin reference markets). The March 22, 2026 Resolv exploit is a worked case of the latency channel, see the callout below.
6. Congestion-dependent liquidation (wrong-way risk)
TradFi clearing infrastructure has dedicated default-management resources independent of market stress. Onchain liquidation requires blockspace, and blockspace cost is endogenous to the same stress that triggers the liquidation: gas prices and MEV competition spike exactly when large-scale liquidation is needed. The probability that a triggered liquidation is economically unexecutable is strictly higher conditional on stress than unconditionally. This is wrong-way risk in the formal sense, the protection mechanism weakens precisely when it is most needed.

Reference case: Resolv / Morpho, March 22, 2026

A compromised offchain signing key, a single EOA with no onchain validation, was used to mint 80 million unbacked USR tokens for $200,000 in attacker-controlled capital. That piece was a key-management and contract-design failure, outside the scope of any vault-risk framework. What happened next was entirely inside scope: a cascade of secondary losses driven by oracle-latency and wrong-way-allocation failures that our risk methodology is now explicitly calibrated to catch.

Oracle latency (failure mode 5a). USR's NAV-based oracle updated once per 24 hours. For hours after the exploit, it faithfully reported pre-exploit collateral-to-supply ratios: RLP oracle read $1.29 while the market cleared at $0.52; USR oracle read near $1.00 while Curve pools showed $0.025. Secondary borrowers, uninvolved in the original exploit, rationally borrowed USDC against oracle-inflated USR collateral because the price gap was an arbitrage-like opportunity if you trusted the oracle. The resulting bad debt was not the attacker's work. It was the mechanical consequence of an oracle that had become structurally incorrect on an automatically-enforced lending protocol.

Wrong-way automated allocation. Public-allocator vaults (including Gauntlet-curated Morpho markets) treated the 100% utilization that emerged in the affected markets as a yield signal and continued supplying USDC to those markets for hours after the exploit began. Before auto-allocators began flowing USDC in, bad debt in the affected Morpho markets was approximately $4,900. The multi-million-dollar vault losses were generated by automated capital inflows after stress was already visible onchain. The post-mortem floor on ecosystem losses is roughly $3.8M in bad debt across Morpho markets alone, with $8.9M total allocated capital exposed.

Key takeaways for TI's risk scoring: (1) Oracle design is a credit-risk input, not an engineering detail. A NAV-based oracle updating on a 24-hour cadence against a synthetic stablecoin whose underlying value correlates with market stress has a structurally predictable false-solvency window. This is derivable from public facts (update frequency + reference-asset volatility) before any exploit; the Oracle dimension's sub-criteria now explicitly flag update cadence on stress-correlated collateral. (2) Automated allocators need stress conditionality. A public allocator that treats 100% utilization as pure yield, without a deteriorating-collateral guard, is wrong-way allocation by design. If a curator's published allocator doesn't document its behavior under collateral-peg failure, assume it doesn't have one. This is now scored under Governance (parameter reactivity) and Liquidity (allocator behavior under stress). (3) Rehypothecation depth is a risk multiplier, not a neutral design choice. When the collateral asset is itself a share token of another lending strategy (USR backed by a delta-neutral funding-rate position), a shock propagates through every protocol layer that accepted the derivative as collateral. Our Collateral Concentration check now treats rehypothecation chains >= 2 layers as a separate flag.

Sources: Anastasiia (@mathy_research), "DeFi Lending Credit Risk: A Three-Part Framework" (Vault Summit, April 2026); Morpho incident retrospective; Resolv governance forum post-mortem. Last verified: 2026-04-23.

What this framework does not capture

We are explicit about the limits of the rubric so users can apply it appropriately:

  • Regulatory risk varies by jurisdiction and changes faster than quarterly re-scoring can capture. We flag major regulatory actions on individual research pages as they happen.
  • Systemic contagion (what happens if a depeg cascades through 10 protocols that share collateral types) is not directly scored. Oracle and Liquidity sub-criteria cover the proximate risks; true contagion requires separate stress-test analysis.
  • Insurance and cover costs are not in the rubric. A protocol with expensive Nexus Mutual cover is signaling higher perceived risk from specialist underwriters, which is information we recommend users incorporate separately.
  • Team reputation and off-chain conduct are partially captured in Governance and Signer Diversity sub-criteria but not exhaustively. We do not score team members individually.
  • Agentic-execution risk is not yet scored. As AI agents begin transacting directly onchain via standards like x402 and ERC-8004 / 8183 / 8211, a new attack class becomes relevant: prompt injection through poisoned oracles, ENS records, or contract metadata can hijack agent behaviour and drain wallets with no phishing link clicked or malware installed. This is adjacent to our Oracle dimension but distinct in that the attacker surface is the agent's reasoning layer rather than the protocol's pricing layer. We expect to add an agentic-execution sub-criterion once the workload is material enough to score.

How scores are updated

Sub-criteria and dimension scores are reviewed on an ongoing basis. Changes are logged in the defi-risk-scores.json source file with a bumped lastUpdated date. Major changes (Chaos Labs departing Aave, Drift being attacked, the April 2026 Kelp / LayerZero incident, a new audit round published) trigger same-day re-scoring. Minor drift is re-evaluated weekly.

The framework itself is versioned. The current version (v2) was published in April 2026 after decomposing the original six-dimension aggregate scores into the twenty sub-criteria documented here.

See the framework in action

Every protocol on the DeFi Risk Map has its six dimension scores and twenty sub-criteria visible in context.

Open the DeFi Risk Map →