Skip to content

Add recursive JS endpoint extraction to --crawl #1494

@J1W0N-1209

Description

@J1W0N-1209

What is the feature?

This feature extends dirsearch’s --crawl capability by sending additional HTTP requests to fetch JavaScript files referenced in HTML via <script src="">, and then extracting API endpoints or URL patterns contained within those JS files.

Currently, the --crawl option only parses URLs directly from HTML tags and attributes. However, many modern web applications define their primary API routes inside JavaScript bundle files (e.g., React, Vue, Angular). By analyzing JS files as part of a second-stage crawling process, dirsearch can significantly improve its endpoint discovery capabilities.

The feature would work as follows:

Extract <script src="..."> paths from the HTML response.
Examples: /static/js/main.js, /assets/app.chunk.js, etc.

Send HTTP GET requests to retrieve the referenced JS files.
(Handled in memory; no need to save files)

Analyze the JS code to extract strings or patterns that look like API endpoints.
Example patterns:

  • /api/v1/...
  • /auth/login
  • fetch("/…")
  • axios.get("…"), axios.post("…")
  • "/v1/user/info"

Regex-based URL patterns (e.g., /[a-zA-Z0-9/_-]+)

Automatically push the discovered URLs into dirsearch’s scanning queue
so that additional requests can be made to those endpoints.

By adding this feature, dirsearch would go beyond simple HTML-based crawling and gain the ability to automatically identify hidden API endpoints defined within JavaScript files, greatly enhancing its crawling and endpoint enumeration capabilities.

What is the use case?

Modern web applications—especially those built with React, Vue, and Angular—often store important API routes and internal endpoints inside JavaScript bundle files, not in the HTML itself.

In SPA (Single Page Application) architectures, it is extremely common for:

  • API endpoints (e.g., /api/v1/..., /auth/login)

  • Route paths (e.g., "/v1/user/info")

  • Authentication/authorization endpoints

  • Admin or internal-only routes

  • Other sensitive or hidden URLs

to exist only as hard-coded strings within JS code, typically inside large bundle files such as main.js, app.js, or hashed chunks.

Because these endpoints are not referenced in HTML, the current --crawl behavior cannot discover them at all.

Adding JS parsing to --crawl is important because:

  • It enables dirsearch to identify hidden API endpoints that are completely invisible in HTML.

  • It dramatically improves endpoint discovery for modern, JS-heavy and SPA-based applications.

  • It expands dirsearch’s crawling coverage to match how real-world front-end frameworks structure their code.

  • It reduces the need for manual inspection of JS bundles during recon or security assessments.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions