Foundations

Foundations

Where the Snapshot Lives Changes Everything Else

Playwright now ships two modes for agent-driven browser automation. One streams page snapshots into the LLM's context window. The other writes them to disk as files. The token difference is 4x. The difference in what each mode leaves behind after the session ends is harder to quantify and harder to ignore. One produces artifacts you can diff and inspect independently of the agent's own account. The other stores observation and inference in the same stream, inseparable after the fact. Where the snapshot lives turns out to determine what you can verify.

Where the Snapshot Lives Changes Everything Else
Playwright now ships two modes for agent-driven browser automation. One streams page snapshots into the LLM's context window. The other writes them to disk as files. The token difference is 4x. The difference in what each mode leaves behind after the session ends is harder to quantify and harder to ignore. One produces artifacts you can diff and inspect independently of the agent's own account. The other stores observation and inference in the same stream, inseparable after the fact. Where the snapshot lives turns out to determine what you can verify.
The Context Window Fills Up Long Before It's Full

A model with a 200K token context window starts degrading at 50K. Selective retrieval using a quarter of the tokens beats full-context loading by nearly 20 accuracy points. The finding gives "context engineering" its teeth. The work is keeping the context window clean. But teams choosing a context strategy today are choosing a reliability envelope with no data on what that envelope actually contains. The research to measure it hasn't been done.
The Context Window Fills Up Long Before It's Full
A model with a 200K token context window starts degrading at 50K. Selective retrieval using a quarter of the tokens beats full-context loading by nearly 20 accuracy points. The finding gives "context engineering" its teeth. The work is keeping the context window clean. But teams choosing a context strategy today are choosing a reliability envelope with no data on what that envelope actually contains. The research to measure it hasn't been done.

Further Reading








