Commit f3d9a79
authored
fix: suppress info message for undefined maxRequestsPerCrawl (apify#3237)
### Issue
apify#3138
### Summary
I was able to recreate the issue and also observed `undefined` in the
info log:
```
INFO CheerioCrawler: The number of requests enqueued by the crawler reached the maxRequestsPerCrawl limit of undefined requests and no further requests will be added.
```
With this PR, the log message will reflect the limit it is reporting on.
### Example - `maxRequestsPerCrawl` is defined and triggered
Snippet:
```
const crawler = new CheerioCrawler({
maxRequestsPerCrawl: 1,
requestHandler: async ({ request, log, pushData, enqueueLinks }) => {
log.info(`Handling url ${request.url}`);
await pushData({ url: request.url })
await enqueueLinks({
selector: 'a[href*="apify.com"]',
strategy: EnqueueStrategy.SameDomain,
});
},
});
```
Logs:
```
INFO System info {"apifyVersion":"3.5.1","apifyClientVersion":"2.19.0","crawleeVersion":"3.15.2","osType":"Darwin","nodeVersion":"v22.17.0"}
INFO CheerioCrawler: Starting the crawler.
INFO CheerioCrawler: Handling url https://www.apify.com
INFO CheerioCrawler: Crawler reached the maxRequestsPerCrawl limit of 1 requests and will shut down soon. Requests that are in progress will be allowed to finish.
INFO CheerioCrawler: Earlier, the crawler reached the maxRequestsPerCrawl limit of 1 requests and all requests that were in progress at that time have now finished. In total, the crawler processed 1 requests and will shut down.
INFO CheerioCrawler: Final request statistics: {"requestsFinished":1,"requestsFailed":0,"retryHistogram":[1],"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":861,"requestsFinishedPerMinute":58,"requestsFailedPerMinute":0,"requestTotalDurationMillis":861,"requestsTotal":1,"crawlerRuntimeMillis":1028}
INFO CheerioCrawler: Finished! Total 1 requests: 1 succeeded, 0 failed. {"terminal":true}
```
### Example - `enqueueLinks` is defined and triggered
Snippet:
```
const crawler = new CheerioCrawler({
requestHandler: async ({ request, log, pushData, enqueueLinks }) => {
log.info(`Handling url ${request.url}`);
await pushData({ url: request.url })
await enqueueLinks({
selector: 'a[href*="apify.com"]',
strategy: EnqueueStrategy.SameDomain,
limit: 1
});
},
});
```
Logs:
```
INFO System info {"apifyVersion":"3.5.1","apifyClientVersion":"2.19.0","crawleeVersion":"3.15.2","osType":"Darwin","nodeVersion":"v22.17.0"}
INFO CheerioCrawler: Starting the crawler.
INFO CheerioCrawler: Handling url https://www.apify.com
INFO CheerioCrawler: The number of requests enqueued by the crawler reached the enqueueLinks limit.
... ^ repeated many times
INFO CheerioCrawler: Handling url https://console.apify.com/sign-up
INFO CheerioCrawler: All requests from the queue have been processed, the crawler will shut down.
INFO CheerioCrawler: Final request statistics: {"requestsFinished":2,"requestsFailed":0,"retryHistogram":[2],"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":517,"requestsFinishedPerMinute":99,"requestsFailedPerMinute":0,"requestTotalDurationMillis":1034,"requestsTotal":2,"crawlerRuntimeMillis":1217}
INFO CheerioCrawler: Finished! Total 2 requests: 2 succeeded, 0 failed. {"terminal":true}
```
### Example - `maxRequestsPerCrawl` and `enqueueLinks` are both defined
and triggered
Snippet:
```
const crawler = new CheerioCrawler({
maxRequestsPerCrawl: 1,
requestHandler: async ({ request, log, pushData, enqueueLinks }) => {
log.info(`Handling url ${request.url}`);
await pushData({ url: request.url })
await enqueueLinks({
selector: 'a[href*="apify.com"]',
strategy: EnqueueStrategy.SameDomain,
limit: 1
});
},
});
```
Logs:
```
INFO System info {"apifyVersion":"3.5.1","apifyClientVersion":"2.19.0","crawleeVersion":"3.15.2","osType":"Darwin","nodeVersion":"v22.17.0"}
INFO CheerioCrawler: Starting the crawler.
INFO CheerioCrawler: Handling url https://www.apify.com
INFO CheerioCrawler: Crawler reached the maxRequestsPerCrawl limit of 1 requests and will shut down soon. Requests that are in progress will be allowed to finish.
INFO CheerioCrawler: Earlier, the crawler reached the maxRequestsPerCrawl limit of 1 requests and all requests that were in progress at that time have now finished. In total, the crawler processed 1 requests and will shut down.
INFO CheerioCrawler: Final request statistics: {"requestsFinished":1,"requestsFailed":0,"retryHistogram":[1],"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":801,"requestsFinishedPerMinute":62,"requestsFailedPerMinute":0,"requestTotalDurationMillis":801,"requestsTotal":1,"crawlerRuntimeMillis":968}
INFO CheerioCrawler: Finished! Total 1 requests: 1 succeeded, 0 failed. {"terminal":true}
```1 parent cb5ded5 commit f3d9a79
File tree
3 files changed
+9
-2
lines changed- packages
- basic-crawler/src/internals
- core/src/enqueue_links
3 files changed
+9
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1144 | 1144 | | |
1145 | 1145 | | |
1146 | 1146 | | |
| 1147 | + | |
| 1148 | + | |
| 1149 | + | |
| 1150 | + | |
| 1151 | + | |
| 1152 | + | |
| 1153 | + | |
1147 | 1154 | | |
1148 | 1155 | | |
1149 | 1156 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
490 | 490 | | |
491 | 491 | | |
492 | 492 | | |
493 | | - | |
| 493 | + | |
494 | 494 | | |
495 | 495 | | |
496 | 496 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
50 | | - | |
| 50 | + | |
51 | 51 | | |
52 | 52 | | |
53 | 53 | | |
| |||
0 commit comments