Add recursive JS endpoint extraction to --crawl

### What is the feature?
This feature extends dirsearch’s --crawl capability by sending additional HTTP requests to fetch JavaScript files referenced in HTML via <script src="">, and then extracting API endpoints or URL patterns contained within those JS files.

Currently, the --crawl option only parses URLs directly from HTML tags and attributes. However, many modern web applications define their primary API routes inside JavaScript bundle files (e.g., React, Vue, Angular). By analyzing JS files as part of a second-stage crawling process, dirsearch can significantly improve its endpoint discovery capabilities.

The feature would work as follows:

Extract <script src="..."> paths from the HTML response.
Examples: /static/js/main.js, /assets/app.chunk.js, etc.

Send HTTP GET requests to retrieve the referenced JS files.
(Handled in memory; no need to save files)

Analyze the JS code to extract strings or patterns that look like API endpoints.
Example patterns:

- /api/v1/...
- /auth/login
- fetch("/…")
- axios.get("…"), axios.post("…")
- "/v1/user/info"

Regex-based URL patterns (e.g., /[a-zA-Z0-9/_-]+)

Automatically push the discovered URLs into dirsearch’s scanning queue
so that additional requests can be made to those endpoints.

By adding this feature, dirsearch would go beyond simple HTML-based crawling and gain the ability to automatically identify hidden API endpoints defined within JavaScript files, greatly enhancing its crawling and endpoint enumeration capabilities.

### What is the use case?
Modern web applications—especially those built with React, Vue, and Angular—often store important API routes and internal endpoints inside JavaScript bundle files, not in the HTML itself.

In SPA (Single Page Application) architectures, it is extremely common for:

- API endpoints (e.g., /api/v1/..., /auth/login)

- Route paths (e.g., "/v1/user/info")

- Authentication/authorization endpoints

- Admin or internal-only routes

- Other sensitive or hidden URLs

to exist only as hard-coded strings within JS code, typically inside large bundle files such as main.js, app.js, or hashed chunks.

Because these endpoints are not referenced in HTML, the current --crawl behavior cannot discover them at all.

Adding JS parsing to --crawl is important because:

- It enables dirsearch to identify hidden API endpoints that are completely invisible in HTML.

- It dramatically improves endpoint discovery for modern, JS-heavy and SPA-based applications.

- It expands dirsearch’s crawling coverage to match how real-world front-end frameworks structure their code.

- It reduces the need for manual inspection of JS bundles during recon or security assessments.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add recursive JS endpoint extraction to --crawl #1494

What is the feature?

What is the use case?

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Add recursive JS endpoint extraction to --crawl #1494

Description

What is the feature?

What is the use case?

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions