Add synthetic readers for lexical context (closes #497)#670
Conversation
Templates collect synthetic read-only Tools for non-Tool symbols in their lexical scope, alongside the existing real-Tool collection. The LLM can call these readers to inspect lexical state on demand instead of having the entire scope dumped into the system prompt. Two reader flavors via singledispatch on the value's type: - Definition-readers for classes and functions return text via pydoc.render_doc (level="short", default — byte-equivalent to help(obj)) or inspect.getsource (level="full"). They bypass Encodable and just return str. - Value-readers for everything else return the live value, encoded through the existing Encodable pipeline. Probe is TypeAdapter(Encodable[T]).json_schema(); on any failure (Pydantic schema error, unencodable types like Term/Operation/TypeVar) the symbol is silently skipped. _collect_synthetic_readers is wired in two places: call_assistant (sees template.__context__ + bound args, mirroring Python call semantics) and Template.tools (sees template.__context__ only). Real Tools collected by _collect_tools take precedence — synthetic readers fill the gap. A short static preface sentence is appended to Template.__system_prompt__ so the LLM knows the read-only-readers category exists. The structured tools array carries per-tool semantics; the preface does not enumerate them. Two existing assertions in test_handlers_llm_template.py flip from 'local_variable not in a.f.tools' to 'in', reflecting the new behavior. 19 new unit tests cover the singledispatch matrix, the probe contract, live-read semantics, the BaseModel-via-metaclass dispatch case, the Box-via-TypeError-chain skip path, and the system-prompt preface. One recorded-fixture integration test exercises the end-to-end LLM-reads-lexical-value path. hide=/expose= knob deferred to a follow-up.
Adds an explicit instruction to generate_good_poem to ignore any read-only lexical reader tools that may appear in the tool list. With synthetic readers now exposing module-level imports/classes as inspectable tools, the LLM was exploring those instead of finishing the task, exceeding max_calls=4.
|
*Filtering, currently, we don't do any filtering, so for example, agent could fetch the API keys through env var if it's in the same lexical context. API_KEY = ...
@Template.define
def generate_some_thing(...) -> str:
raise NotHandledOther than that, we are also including:
Thinking of adding |
eb8680
left a comment
There was a problem hiding this comment.
Thanks for taking a pass at this. I think it can be quite a bit simpler if you defer most behavior to Encodable. That even includes types like type and types.ModuleType that are currently missing Encodable implementations - they should still trigger Pydantic schema generation errors, which you can catch and use to skip tool generation.
| return env[name] | ||
|
|
||
| body.__name__ = name | ||
| body.__doc__ = f"Read the value of lexical variable `{name}` (type `{inferred}`)." |
There was a problem hiding this comment.
The type should already be part of the tool schema via the return type annotation, you shouldn't need to repeat it in the docstring.
| return result | ||
|
|
||
|
|
||
| def _build_definition_reader( |
There was a problem hiding this comment.
This seems like a duplicate of Encodable for Callable.
|
|
||
| kind = "class" if inspect.isclass(value) else "function" | ||
|
|
||
| def body(level: typing.Literal["short", "full"] = "short") -> str: |
There was a problem hiding this comment.
This level toggle is an interesting idea, but it should probably enter as part of Encodable, if at all. I would suggest removing this distinction and the new "short" path from the PR for now since it seems like an optimization and instead deferring to whatever the current behavior is for Encodable.
|
|
||
|
|
||
| @functools.singledispatch | ||
| def _build_synthetic_reader( |
There was a problem hiding this comment.
Making this a singledispatch function seems like it's going to duplicate a lot of logic that should really just be in Encodable. I'd maybe also make this a special internal-only Tool subclass in case it's useful elsewhere in the code to distinguish these special tools from regular ones:
class _LexicalVariableTool[T](Tool[[], T]):
"""A Tool wrapper for a variable captured from the lexical context. This allows
variables to be automatically available as tools without explicit wrapping.
The tool takes no arguments and returns the value of the variable when called.
"""
@classmethod
def define(cls, value: T, *, name: str, **kwargs) -> Tool[[], T]:
assert name.isidentifier()
assert not isinstance(value, Tool)
typ: type[T] = nested_type(value).value # or maybe just type(value)?
assert pydantic.TypeAdapter(Encodable[typ]).json_schema()
tool_fn = lambda: value
tool_fn.__name__ = name
tool_fn.__qualname__ = name
tool_fn.__module__ = value.__module__
tool_fn.__doc__ = f"""Reads value of lexical variable `{name}`"""
tool_fn.__annotations__.update({"return": typ})
return super().define(tool_fn, name=name, **kwargs)There was a problem hiding this comment.
Another benefit of doing this is that the class docstring is a natural home for information about how to use such tools (e.g. during synthesis #497 ) that you'd like to inject into the system prompt.
| return Tool.define(body) | ||
|
|
||
|
|
||
| _build_synthetic_reader.register(type, _build_definition_reader) |
There was a problem hiding this comment.
Seems like this should be subsumed by Encodable[type]
| `_collect_tools` or intentionally excluded). | ||
| """ | ||
| try: | ||
| inferred: typing.Any = nested_type(value).value |
There was a problem hiding this comment.
nested_type is supposed to return Box(type(value)) rather than raise a TypeError whenever it gets a value it doesn't understand - if it raises an error that's most likely a bug in nested_type and it should fail loudly.
| except Exception: | ||
| # The probe chains through several third-party libraries | ||
| # (nested_type, inspect.signature, typing.get_overloads, | ||
| # Pydantic schema generation). Any failure means "this symbol |
There was a problem hiding this comment.
I think schema generation is the only thing we'd want to fail silently, and I think it should raise a more specific Pydantic error type that we can catch here rather than just catching any Exception.
|
|
||
| # Module-level binding the LLM will be asked to inspect via a synthetic | ||
| # reader. The reader's name in the tool list is `_known_data`. | ||
| _known_data = [10, 20, 30, 40, 50] |
There was a problem hiding this comment.
nit: put this in the test function just above report_sum, there's no reason for it to be a global.
|
|
||
| from effectful.ops.types import Annotation, Operation | ||
|
|
||
| _LEXICAL_READERS_PREFACE = ( |
There was a problem hiding this comment.
This information seems like it should be part of the docstring of each generated lexical variable tool, rather than tacked onto the system prompt once. That way there's no ambiguity from the model's perspective about what each individual tool does.
| # bound-args layer that LiteLLMProvider._call adds to env, so the LLM | ||
| # sees readers for the Template's arguments alongside other lexical | ||
| # context. | ||
| tools.update(_collect_synthetic_readers(env, set(tools))) |
There was a problem hiding this comment.
Rather than making _collect_synthetic_readers a separate step, I'd inline its body inside _collect_tools so that we benefit from the deduplication and sorting that already happens there.
|
Also, I don't think this closes #497 in its current form. I don't see any tests for context-sensitivity during synthesis and there's currently nothing to indicate to the LLM that these are lexical variables that are available in generated code. We could leave that behavior for a followup PR to keep this one tractable, although I don't think including it would require much more library code, just more testing. |
Closes #497.
Templates expose synthetic read-only tools for lexical symbols that are not already real tools. The LLM can call them to inspect lexical state on demand, instead of receiving the entire scope dumped into the system prompt. Picks up where #545 left off; #585 landed the system-prompt half partially.
Example. A Template defined with a module-global the LLM should be able to see:
The LLM sees a tool named
_known_datawhose description is"Read the value of lexical variable_known_data(typelist[int]).". It calls the tool, receives the list, and returns the sum.Dispatch. Used `functools.singledispatch` to define different readers:
Definition-readers fire for classes and functions. The reader takes
level: Literal["short", "full"]and returnsstr. Short usespydoc.render_doc(byte-equivalent tohelp(obj)); full usesinspect.getsource. The probe isinspect.getsourcereachability; symbols whose source is unreachable (builtin C, REPL lambdas) are skipped.Value-readers fire for everything else (the default branch). The reader is zero-arg and returns
env[name]live. The probe ispydantic.TypeAdapter(Encodable[T]).json_schema(); any failure causes the symbol to be skipped silently. Catch is broad on purpose, because the probe chains through several third-party libraries (nested_type,inspect.signature,typing.get_overloads, Pydantic schema generation) any of which can crash on third-party objects.Tool,Agent, andModuleTypevalues are registered to return None.ToolandAgentare already collected by the existing real-tool path; modules are too big to expose by default.System prompt. A short static sentence is appended to
Template.__system_prompt__so the LLM knows the read-only-readers category exists. The structured tools array carries per-tool semantics; the preface doesn't enumerate them.Tests. Invariants pinned by the unit tests:
env[name]evaluated at call time, so mutations and rebinds are visible.env[name]after collection causes the reader to raiseKeyErroron invocation.inspect.getsourcerather thanpydoc.BaseModelsubclasses route correctly through singledispatch despite having a non-typemetaclass.Boxvalues are filtered through their respective skip paths.Template.__system_prompt__contains the preface unconditionally.Invariant pinned by the integration test: a Template defined alongside a module-global value, prompted to call the synthetic reader and report a derived value, produces the correct answer end-to-end through a real LLM. The fixture was recorded against gpt-4o-mini and replays cleanly.