fix(browser): make get_text extraction method and get_html truncation configurable by danielferreira-dias · Pull Request #400 · strands-agents/tools

danielferreira-dias · 2026-02-13T23:46:17Z

Summary

get_text: Default text extraction changed from text_content() to inner_text(),
which excludes script/style/hidden content. The old behavior is available via
method: "text_content" on GetTextAction.
get_html: Removed the hard-coded 1,000-character truncation. Full HTML is now returned
by default. An optional max_length field on GetHtmlAction lets callers opt into truncation.

Motivation

text_content() includes <script>, <style>, and hidden element text, polluting LLM
context when agents read pages. inner_text() is style-aware and produces much cleaner output.
The 1,000-char HTML truncation made get_html unusable for reading full page content,
and the LLM was never informed about it. Meanwhile get_text had no truncation at all,
making the limit inconsistent.
get_html returns proper HTML with tags (<script>, <style>, <nav>, etc.), which
downstream processing (like markdown conversion) can strip effectively. Without full HTML,
the only option is get_text where JavaScript/CSS appears as untagged noise that cannot
be reliably cleaned.

References

Playwright: innerText vs textContent — text_content() includes scripts/styles, inner_text() excludes them
Playwright Issue #18894 — edge case where inner_text can still include script content; Playwright team recommends stripping <script> elements as workaround
Playwright itself imposes no character limit on content(), inner_html(), or text_content() — the 1,000-char truncation was an arbitrary application-level constraint

Test plan

Verify get_text with default (no method field) uses inner_text and excludes scripts/styles
Verify get_text with method: "text_content" returns raw text including scripts
Verify get_html with no max_length returns full HTML without truncation
Verify get_html with max_length: 500 truncates and appends "..."
Existing browser tests still pass

… configurable get_text now defaults to inner_text (excludes script/style/hidden content) instead of text_content, with an opt-in method field to restore the old behavior. get_html removes the hard-coded 1000-char truncation in favor of an optional max_length field that defaults to no truncation. Co-Authored-By: Daniel Dias <DDias@euronext.com>

danielferreira-dias · 2026-02-18T15:53:20Z

We have a browser automation agent that navigates pages and processes their content. We built an AfterToolCallEvent hook that intercepts get_html output, strips noise (<script>, <style>, hidden elements, etc.) and converts the HTML to clean markdown using markdownify.

The problem is the 1,000-char hard truncation on get_html — by the time our hook receives the output, it's already an incomplete HTML fragment. This makes the entire get_html → markdown conversion pipeline useless.

Removing the truncation would let downstream hooks and tools properly process full page HTML, which is a cleaner approach than relying on get_text/text_content() that includes unrendered scripts and CSS noise.

danielferreira-dias · 2026-02-27T10:23:33Z

This would be extremely valuable to fix, at least removing the get_html truncation limitation, which currently makes downstream HTML processing (e.g., HTML-to-markdown conversion via hooks) non-functional.

Feature Issue
Bug Issue

danielferreira-dias requested a deployment to manual-approval February 13, 2026 23:46 — with GitHub Actions Waiting

This was referenced Feb 27, 2026

[BUG] get_html hard-coded 1,000-character truncation returns broken HTML fragments #407

Open

[FEATURE] Make browser tool get_text extraction method configurable #408

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(browser): make get_text extraction method and get_html truncation configurable#400

fix(browser): make get_text extraction method and get_html truncation configurable#400
danielferreira-dias wants to merge 1 commit intostrands-agents:mainfrom
danielferreira-dias:fix/browser-get-text-and-get-html

danielferreira-dias commented Feb 13, 2026

Uh oh!

danielferreira-dias commented Feb 18, 2026

Uh oh!

danielferreira-dias commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

danielferreira-dias commented Feb 13, 2026

Summary

Motivation

References

Test plan

Uh oh!

danielferreira-dias commented Feb 18, 2026

Uh oh!

danielferreira-dias commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant