Skip to content

feat: Checkboxes and Radio Button Mapping for PDFs (GSoC)#407

Open
Dotify71 wants to merge 5 commits into
fireform-core:mainfrom
Dotify71:fix/startup-and-tests
Open

feat: Checkboxes and Radio Button Mapping for PDFs (GSoC)#407
Dotify71 wants to merge 5 commits into
fireform-core:mainfrom
Dotify71:fix/startup-and-tests

Conversation

@Dotify71
Copy link
Copy Markdown

@Dotify71 Dotify71 commented Mar 31, 2026

This PR implements Checkboxes and Radio Button Mapping (Deliverable 1 from my GSoC Proposal).

Advanced PDF Form Mapping (src/filler.py)

  • Previously, filler.py indiscriminately assigned the LLM string output to the Value (/V) property of all Widget types. This breaks when a PDF contains checkboxes or radio buttons (Field Type /Btn).
  • Implementation: Added conditional logic inside the loop passing through sorted_annots:
    • Detects if an annotation is a Button (/Btn).
    • Checks the generated LLM response for truthy values (yes, true, 1, x).
    • Dynamically extracts the correct internal ON state identifier directly from the PDF Appearance Dictionary (annot.AP.N).
    • Assigns both the Value (/V) and Appearance State (/AS) to accurately check the box on the final PDF layout, while assigning /Off when falsy.

Strict Boolean Typing (per maintainer feedback)

  • The fields dict now accepts Python types as values (e.g. {"is awake": bool})
  • build_prompt() detects bool fields and explicitly instructs the LLM to return only the literal string 'True' or 'False'.
  • add_response_to_json() strictly coerces LLM output to Python bool for boolean fields.
  • filler.py uses isinstance(answer, bool) to activate a checkbox/radio button.

(Note: The previous redundant fixes for #135 and #380 were dropped via a merge with upstream/main)

Dushyant Acharya and others added 3 commits March 31, 2026 23:51
Implemented PDF /Btn dictionary parsing in filler.py to extract and dynamically map truthy LLM outputs to their specific 'ON' Appearance Mode instead of blindly appending strings. Also resolved broken backend pipeline in main.py by initializing the base Controller instead of the removed Fill class.
@Dotify71
Copy link
Copy Markdown
Author

Hi @marcvergees @juanalvv! I've resolved the merge conflicts in Makefile and src/main.py. The PR is now up to date with the latest main branch and ready for review/merge. It includes the fixes for #135 and #380, along with the PDF checkbox/radio mapping feature. Thanks!

Copy link
Copy Markdown
Member

@marcvergees marcvergees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's something you should need to care about. You've just implemented the fact that if the output of the LLM is something like "yes", "true", etc, and if the fillable field is a radiobutton, then you active it. But how you that the LLM is gonna return something like this? There should be something in the pipeline and the dict that gives all fields regarding booleans. E.g. the fields dict is {"is awake": boolean}, and tell the LLM to give a boolean specifically True or False, that way, we could verify that the field filled would be something like that. Please fix that and we'll be able to test and merge it.

Per maintainer feedback in fireform-core#407:
- The fields dict now accepts Python types as values (e.g. {'is awake': bool})
- build_prompt() detects bool fields and explicitly instructs the LLM to
  return only the literal string 'True' or 'False', not fuzzy values
- add_response_to_json() strictly coerces LLM output to Python bool for
  bool fields, logging a warning if an unexpected value is returned
- filler.py now uses isinstance(answer, bool) instead of string matching
  so only a guaranteed Python True activates a checkbox/radio button
- Updated example in main.py to demonstrate the new typed fields dict
@Dotify71
Copy link
Copy Markdown
Author

Dotify71 commented May 15, 2026

Hi @marcvergees! Thanks for the clear feedback. I've addressed your concern in the latest commit (9feeb78).

The fields dict now carries Python type annotations (e.g. {"is awake": bool}). When build_prompt() sees a bool field, it explicitly instructs the LLM: "You MUST respond with ONLY the literal word True or False." The response is then strictly coerced to a Python bool in add_response_to_json() — any unexpected value logs a warning and defaults to None. Finally, filler.py uses isinstance(answer, bool) instead of fuzzy string matching, so only a guaranteed Python True activates a checkbox/radio button. Ready for re-review!

@Dotify71 Dotify71 changed the title fix: import Union in main.py and correct pytest directory in Makefile feat: Checkboxes and Radio Button Mapping for PDFs (GSoC) May 24, 2026
@Dotify71 Dotify71 requested a review from marcvergees May 24, 2026 08:21
@Dotify71
Copy link
Copy Markdown
Author

Hi @marcvergees! I've updated the PR. I pulled in the latest changes from main to resolve the merge conflicts and dropped my old redundant fixes for #135 and #380 (since you guys already took care of those!).

The PR is now strictly focused on the GSoC Checkbox and Radio Button mapping feature, and it incorporates your feedback regarding strict boolean typing. It's fully up to date and ready for you to take another look whenever you have a moment. Thanks!

Copy link
Copy Markdown
Member

@marcvergees marcvergees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've been going through your PR modification and looks good. There's just a few couple things more that we should take care. I've been testing and encountered a few problems regarding check/radiobuttons because of the output of the LLM. With this document, the output of the LLM is this:

{
"Date To": "05/25/2026",
"Date From": "05/24/2026",
"Time From": "05/24/2026 07:00",
"1. Incident Name:": "Oakridge Mall Commercial Fire",
"Time To": "05/25/2026 07:00",
"3. Objective(s):": "1. Ensure public and responder safety via strict accountability, 2. Isolate and suppress structural fire in Wing B, and 3. Minimize hazardous runoff into municipal storm drains",
"4. Operational Period Command Emphasis:": "defensive exterior operations on Wing B and maintaining a 50-foot collapse zone around structural steel elements",
"General Situational Awareness": "High ambient temperatures (reaching 92\u00b0F) leading to rapid responder heat exhaustion, and high-voltage panels on the north wall",
"Site Safety Plan Required: Yes": "Yes",
"Site Safety Plan Required: No": "Yes",
"5. Site Safety Plan Required? Yes No Approved Site Safety Plan(s) Located at:": "Command Post - Battalion 2 Vehicle",
"ICS 203": "ICS 203",
"ICS 207": "ICS 207",
"Other Attachments [1]": "Other Attachments [1]",
"ICS 208": "ICS 208",
"ICS 204": "ICS 204",
"Check 1": "Checked",
"Other Attachments [2]": "Utility Shutoff Verification Log",
"Map/Chart": "Structure Layout/Divisions",
"ICS 205": "ICS 205",
"Check 2": "Checked",
"Other Attachments [3]": "Air Monitoring Log",
"Weather Forecast/Tides/Current": "-1 (No value for the specified JSON field Weather Forecast/Tides/Current was found in the provided text)",
"ICS 205A": "ICS 205A",
"Check 3": "Checked",
"Other Attachments [4]": null,
"Check 4": "Checked",
"ICS 206": "ICS 206",
}

You'll be able to see fields which are checkboxes/radiobuttons, are filled with things like "Checked", "Yes", so that the checkbox/radiobutton is not selected accordingly.

We need to figure out this by imposing a better prompting in prompt.txt. Here it comes an ongoing discussion regarding schema validation at the output of the LLM. Since you've already implemented everything so that matches type, just by adding a few lines you should be able to check if the output of the LLM matches the type in particular. If it doesn't, there should be a retry (maybe using another more specific prompt). These last things are just ideas, but definitely a way to figure out the problem you have rn is by schema validation, which will allow you to double check that the output of the LLM is reliable, and therefore that our code doesn't break in next steps.

PD: definitely now you're encountering real-world problems, which majorily in this repo's issues are not covered (cuz GSoC applicants used AI to raise issues and submit PRs).

PD 2: If you need help with this, ask as many questions as you need. Furthermore, we have our discord channel, where you can ask anything and for sure somebody of our community will be super glad to help you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants