Skip to content

Commit f3d9a79

Browse files
authored
fix: suppress info message for undefined maxRequestsPerCrawl (apify#3237)
### Issue apify#3138 ### Summary I was able to recreate the issue and also observed `undefined` in the info log: ``` INFO CheerioCrawler: The number of requests enqueued by the crawler reached the maxRequestsPerCrawl limit of undefined requests and no further requests will be added. ``` With this PR, the log message will reflect the limit it is reporting on. ### Example - `maxRequestsPerCrawl` is defined and triggered Snippet: ``` const crawler = new CheerioCrawler({ maxRequestsPerCrawl: 1, requestHandler: async ({ request, log, pushData, enqueueLinks }) => { log.info(`Handling url ${request.url}`); await pushData({ url: request.url }) await enqueueLinks({ selector: 'a[href*="apify.com"]', strategy: EnqueueStrategy.SameDomain, }); }, }); ``` Logs: ``` INFO System info {"apifyVersion":"3.5.1","apifyClientVersion":"2.19.0","crawleeVersion":"3.15.2","osType":"Darwin","nodeVersion":"v22.17.0"} INFO CheerioCrawler: Starting the crawler. INFO CheerioCrawler: Handling url https://www.apify.com INFO CheerioCrawler: Crawler reached the maxRequestsPerCrawl limit of 1 requests and will shut down soon. Requests that are in progress will be allowed to finish. INFO CheerioCrawler: Earlier, the crawler reached the maxRequestsPerCrawl limit of 1 requests and all requests that were in progress at that time have now finished. In total, the crawler processed 1 requests and will shut down. INFO CheerioCrawler: Final request statistics: {"requestsFinished":1,"requestsFailed":0,"retryHistogram":[1],"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":861,"requestsFinishedPerMinute":58,"requestsFailedPerMinute":0,"requestTotalDurationMillis":861,"requestsTotal":1,"crawlerRuntimeMillis":1028} INFO CheerioCrawler: Finished! Total 1 requests: 1 succeeded, 0 failed. {"terminal":true} ``` ### Example - `enqueueLinks` is defined and triggered Snippet: ``` const crawler = new CheerioCrawler({ requestHandler: async ({ request, log, pushData, enqueueLinks }) => { log.info(`Handling url ${request.url}`); await pushData({ url: request.url }) await enqueueLinks({ selector: 'a[href*="apify.com"]', strategy: EnqueueStrategy.SameDomain, limit: 1 }); }, }); ``` Logs: ``` INFO System info {"apifyVersion":"3.5.1","apifyClientVersion":"2.19.0","crawleeVersion":"3.15.2","osType":"Darwin","nodeVersion":"v22.17.0"} INFO CheerioCrawler: Starting the crawler. INFO CheerioCrawler: Handling url https://www.apify.com INFO CheerioCrawler: The number of requests enqueued by the crawler reached the enqueueLinks limit. ... ^ repeated many times INFO CheerioCrawler: Handling url https://console.apify.com/sign-up INFO CheerioCrawler: All requests from the queue have been processed, the crawler will shut down. INFO CheerioCrawler: Final request statistics: {"requestsFinished":2,"requestsFailed":0,"retryHistogram":[2],"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":517,"requestsFinishedPerMinute":99,"requestsFailedPerMinute":0,"requestTotalDurationMillis":1034,"requestsTotal":2,"crawlerRuntimeMillis":1217} INFO CheerioCrawler: Finished! Total 2 requests: 2 succeeded, 0 failed. {"terminal":true} ``` ### Example - `maxRequestsPerCrawl` and `enqueueLinks` are both defined and triggered Snippet: ``` const crawler = new CheerioCrawler({ maxRequestsPerCrawl: 1, requestHandler: async ({ request, log, pushData, enqueueLinks }) => { log.info(`Handling url ${request.url}`); await pushData({ url: request.url }) await enqueueLinks({ selector: 'a[href*="apify.com"]', strategy: EnqueueStrategy.SameDomain, limit: 1 }); }, }); ``` Logs: ``` INFO System info {"apifyVersion":"3.5.1","apifyClientVersion":"2.19.0","crawleeVersion":"3.15.2","osType":"Darwin","nodeVersion":"v22.17.0"} INFO CheerioCrawler: Starting the crawler. INFO CheerioCrawler: Handling url https://www.apify.com INFO CheerioCrawler: Crawler reached the maxRequestsPerCrawl limit of 1 requests and will shut down soon. Requests that are in progress will be allowed to finish. INFO CheerioCrawler: Earlier, the crawler reached the maxRequestsPerCrawl limit of 1 requests and all requests that were in progress at that time have now finished. In total, the crawler processed 1 requests and will shut down. INFO CheerioCrawler: Final request statistics: {"requestsFinished":1,"requestsFailed":0,"retryHistogram":[1],"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":801,"requestsFinishedPerMinute":62,"requestsFailedPerMinute":0,"requestTotalDurationMillis":801,"requestsTotal":1,"crawlerRuntimeMillis":968} INFO CheerioCrawler: Finished! Total 1 requests: 1 succeeded, 0 failed. {"terminal":true} ```
1 parent cb5ded5 commit f3d9a79

File tree

3 files changed

+9
-2
lines changed

3 files changed

+9
-2
lines changed

packages/basic-crawler/src/internals/basic-crawler.ts

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1144,6 +1144,13 @@ export class BasicCrawler<Context extends CrawlingContext = BasicCrawlingContext
11441144
this.shouldLogMaxEnqueuedRequestsExceeded = false;
11451145
}
11461146

1147+
if (options.reason === 'enqueueLimit') {
1148+
const enqueuedRequestLimit = this.calculateEnqueuedRequestLimit();
1149+
if (enqueuedRequestLimit === undefined || enqueuedRequestLimit !== 0) {
1150+
this.log.info('The number of requests enqueued by the crawler reached the enqueueLinks limit.');
1151+
}
1152+
}
1153+
11471154
await this.onSkippedRequest?.(options);
11481155
}
11491156

packages/core/src/enqueue_links/enqueue_links.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -490,7 +490,7 @@ export async function enqueueLinks(
490490

491491
let requests = await createFilteredRequests();
492492
if (typeof limit === 'number' && limit < requests.length) {
493-
await reportSkippedRequests(requests.slice(limit), 'limit');
493+
await reportSkippedRequests(requests.slice(limit), 'enqueueLimit');
494494
requests = requests.slice(0, limit);
495495
}
496496

packages/core/src/enqueue_links/shared.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ export type RegExpObject = { regexp: RegExp } & Pick<
4747

4848
export type RegExpInput = RegExp | RegExpObject;
4949

50-
export type SkippedRequestReason = 'robotsTxt' | 'limit' | 'filters' | 'redirect' | 'depth';
50+
export type SkippedRequestReason = 'robotsTxt' | 'limit' | 'enqueueLimit' | 'filters' | 'redirect' | 'depth';
5151

5252
export type SkippedRequestCallback = (args: { url: string; reason: SkippedRequestReason }) => Awaitable<void>;
5353

0 commit comments

Comments
 (0)