{
"query": "Please summarize the whole context. It is important that you include a summary for each file. All files should be included, so please make sure to go through the entire context",
"namespace": "02aaa563-baaa-43bf-b820-2dddbf405a51",
"messages": [],
"stream": false,
"language_level": "",
"chat_channel": "",
"language": "German",
"tone": "neutral",
"writing_style": "standard",
"model": "gemini-1.5-flash",
"knowledgebase": "ki-dev-large",
"seed": 0,
"client_id": 0,
"all_context": true,
"follow_up_for": null,
"knowledgebase_files_count": 0,
"override_command": "",
"disable_clarity_check": true,
"high_consistency": false,
"custom_primer": "",
"logging": true,
"query_route": "",
"web_search": false
}
QUERY ROUTE
Query Route: summary
==================================================
**Elapsed Time: 2.04 seconds**
==================================================
RAG PARAMS
RAG Parameters: {'dynamically_expand': False, 'top_k': 120, 'actual_k': 120, 'satisfying_score': 0}
==================================================
**Elapsed Time: 0.00 seconds**
==================================================
VECTOR SEARCH RESULTS
Results: {'main_results': [{'id': 'e9692916-cddb-4f9b-b62a-3940420a1bcc',
'metadata': {'chunk': 0.0,
'file_name': 'crawler-issues-19MAR2025%20-%20Copy%20%282%29%20-%20Copy%20-%20Copy.txt',
'is_dict': 'no',
'text': '- if CrawlerJob fails statues will never update, import '
'status wont update\r\n'
'(add failed() method -> create CrawlerProcess with '
'failed status, record last process time??)\r\n'
'- if CrawlerProcessJob fails before recording last '
'process time '
'("Cache::put($processCrawler->lastCrawlerProcessTimeCacheKey(), '
'now());") the status will never upate\r\n'
'- importing failed Crawler pages still marked '
'success\r\n'
'- if CrawlerFilesJob fails CrawlerProcess status wont '
'update\r\n'
'- if CrawlerPrepareKnowledgebaseTrainingJob fails '
'import status wont update\r\n'
'- CrawlerFilesProcessTrainingJob@handleProcessingError '
'-- failed items are marked as processed/success.\r\n'
'should be markItemAsFailed() same as in '
'CrawlerPageProcessTrainingJob?\r\n'
'\r\n'
'- Finalizing Logic Duplication\r\n'
'The completion checking and finalization logic is '
'duplicated across multiple jobs:\r\n'
'\r\n'
'CrawlerPageProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CrawlerFilesProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CheckKnowledgebaseCrawlerImportCompletion::handle\r\n'
'\r\n'
'Each has subtle differences, creating opportunities for '
'inconsistent behavior.\r\n'
'\r\n'
'- Unreliable S3 File Operations\r\n'
'File operations on S3 have minimal error handling:\r\n'
'\r\n'
'$this->filesystem->put($s3Path, $newContent);\r\n'
'return $this->filesystem->url($s3Path);\r\n'
'\r\n'
'If the S3 put operation fails silently, subsequent code '
'would continue with a URL to a non-existent file.\r\n'
'\r\n'
'- try using knowledgebase_crawler_imports table instead '
"of cache for counting since it's already "
'implemented?\r\n'
'update counts every x seconds instead of realtime '
'updates?\r\n'
'\r\n'
'- CrawlerFileProcessTrainingJob and/or '
'CrawlerPageProcessTrainingJob failure not marking '
'KnowledgebaseCrawler as fail\r\n'
'- KnowledgebaseCrawlerImport fails getting deleted '
'after'},
'score': 0.0,
'values': []}, {'id': '5f7830ee-054e-4360-8f52-115d00616401',
'metadata': {'chunk': 0.0,
'file_name': 'crawler-issues-19MAR2025%20-%20Copy%20%283%29%20-%20Copy%281%29.txt',
'is_dict': 'no',
'text': '- if CrawlerJob fails statues will never update, import '
'status wont update\r\n'
'(add failed() method -> create CrawlerProcess with '
'failed status, record last process time??)\r\n'
'- if CrawlerProcessJob fails before recording last '
'process time '
'("Cache::put($processCrawler->lastCrawlerProcessTimeCacheKey(), '
'now());") the status will never upate\r\n'
'- importing failed Crawler pages still marked '
'success\r\n'
'- if CrawlerFilesJob fails CrawlerProcess status wont '
'update\r\n'
'- if CrawlerPrepareKnowledgebaseTrainingJob fails '
'import status wont update\r\n'
'- CrawlerFilesProcessTrainingJob@handleProcessingError '
'-- failed items are marked as processed/success.\r\n'
'should be markItemAsFailed() same as in '
'CrawlerPageProcessTrainingJob?\r\n'
'\r\n'
'- Finalizing Logic Duplication\r\n'
'The completion checking and finalization logic is '
'duplicated across multiple jobs:\r\n'
'\r\n'
'CrawlerPageProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CrawlerFilesProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CheckKnowledgebaseCrawlerImportCompletion::handle\r\n'
'\r\n'
'Each has subtle differences, creating opportunities for '
'inconsistent behavior.\r\n'
'\r\n'
'- Unreliable S3 File Operations\r\n'
'File operations on S3 have minimal error handling:\r\n'
'\r\n'
'$this->filesystem->put($s3Path, $newContent);\r\n'
'return $this->filesystem->url($s3Path);\r\n'
'\r\n'
'If the S3 put operation fails silently, subsequent code '
'would continue with a URL to a non-existent file.\r\n'
'\r\n'
'- try using knowledgebase_crawler_imports table instead '
"of cache for counting since it's already "
'implemented?\r\n'
'update counts every x seconds instead of realtime '
'updates?\r\n'
'\r\n'
'- CrawlerFileProcessTrainingJob and/or '
'CrawlerPageProcessTrainingJob failure not marking '
'KnowledgebaseCrawler as fail\r\n'
'- KnowledgebaseCrawlerImport fails getting deleted '
'after'},
'score': 0.0,
'values': []}, {'id': 'ca5dbe5a-f448-4afb-839d-92d439e37f96',
'metadata': {'chunk': 0.0,
'file_name': 'crawler-issues-19MAR2025%20-%20Copy%20%284%29%20-%20Copy%281%29.txt',
'is_dict': 'no',
'text': '- if CrawlerJob fails statues will never update, import '
'status wont update\r\n'
'(add failed() method -> create CrawlerProcess with '
'failed status, record last process time??)\r\n'
'- if CrawlerProcessJob fails before recording last '
'process time '
'("Cache::put($processCrawler->lastCrawlerProcessTimeCacheKey(), '
'now());") the status will never upate\r\n'
'- importing failed Crawler pages still marked '
'success\r\n'
'- if CrawlerFilesJob fails CrawlerProcess status wont '
'update\r\n'
'- if CrawlerPrepareKnowledgebaseTrainingJob fails '
'import status wont update\r\n'
'- CrawlerFilesProcessTrainingJob@handleProcessingError '
'-- failed items are marked as processed/success.\r\n'
'should be markItemAsFailed() same as in '
'CrawlerPageProcessTrainingJob?\r\n'
'\r\n'
'- Finalizing Logic Duplication\r\n'
'The completion checking and finalization logic is '
'duplicated across multiple jobs:\r\n'
'\r\n'
'CrawlerPageProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CrawlerFilesProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CheckKnowledgebaseCrawlerImportCompletion::handle\r\n'
'\r\n'
'Each has subtle differences, creating opportunities for '
'inconsistent behavior.\r\n'
'\r\n'
'- Unreliable S3 File Operations\r\n'
'File operations on S3 have minimal error handling:\r\n'
'\r\n'
'$this->filesystem->put($s3Path, $newContent);\r\n'
'return $this->filesystem->url($s3Path);\r\n'
'\r\n'
'If the S3 put operation fails silently, subsequent code '
'would continue with a URL to a non-existent file.\r\n'
'\r\n'
'- try using knowledgebase_crawler_imports table instead '
"of cache for counting since it's already "
'implemented?\r\n'
'update counts every x seconds instead of realtime '
'updates?\r\n'
'\r\n'
'- CrawlerFileProcessTrainingJob and/or '
'CrawlerPageProcessTrainingJob failure not marking '
'KnowledgebaseCrawler as fail\r\n'
'- KnowledgebaseCrawlerImport fails getting deleted '
'after'},
'score': 0.0,
'values': []}, {'id': 'cace0fce-afd5-461c-801c-81355a90e16d',
'metadata': {'chunk': 0.0,
'file_name': 'crawler-issues-19MAR2025%20-%20Copy%20%283%29%20-%20Copy.txt',
'is_dict': 'no',
'text': '- if CrawlerJob fails statues will never update, import '
'status wont update\r\n'
'(add failed() method -> create CrawlerProcess with '
'failed status, record last process time??)\r\n'
'- if CrawlerProcessJob fails before recording last '
'process time '
'("Cache::put($processCrawler->lastCrawlerProcessTimeCacheKey(), '
'now());") the status will never upate\r\n'
'- importing failed Crawler pages still marked '
'success\r\n'
'- if CrawlerFilesJob fails CrawlerProcess status wont '
'update\r\n'
'- if CrawlerPrepareKnowledgebaseTrainingJob fails '
'import status wont update\r\n'
'- CrawlerFilesProcessTrainingJob@handleProcessingError '
'-- failed items are marked as processed/success.\r\n'
'should be markItemAsFailed() same as in '
'CrawlerPageProcessTrainingJob?\r\n'
'\r\n'
'- Finalizing Logic Duplication\r\n'
'The completion checking and finalization logic is '
'duplicated across multiple jobs:\r\n'
'\r\n'
'CrawlerPageProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CrawlerFilesProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CheckKnowledgebaseCrawlerImportCompletion::handle\r\n'
'\r\n'
'Each has subtle differences, creating opportunities for '
'inconsistent behavior.\r\n'
'\r\n'
'- Unreliable S3 File Operations\r\n'
'File operations on S3 have minimal error handling:\r\n'
'\r\n'
'$this->filesystem->put($s3Path, $newContent);\r\n'
'return $this->filesystem->url($s3Path);\r\n'
'\r\n'
'If the S3 put operation fails silently, subsequent code '
'would continue with a URL to a non-existent file.\r\n'
'\r\n'
'- try using knowledgebase_crawler_imports table instead '
"of cache for counting since it's already "
'implemented?\r\n'
'update counts every x seconds instead of realtime '
'updates?\r\n'
'\r\n'
'- CrawlerFileProcessTrainingJob and/or '
'CrawlerPageProcessTrainingJob failure not marking '
'KnowledgebaseCrawler as fail\r\n'
'- KnowledgebaseCrawlerImport fails getting deleted '
'after'},
'score': 0.0,
'values': []}, {'id': 'bbad564e-be05-4efa-b789-29dc6fd18b82',
'metadata': {'chunk': 0.0,
'file_name': 'crawler-issues-19MAR2025%20-%20Copy%20%284%29%20-%20Copy.txt',
'is_dict': 'no',
'text': '- if CrawlerJob fails statues will never update, import '
'status wont update\r\n'
'(add failed() method -> create CrawlerProcess with '
'failed status, record last process time??)\r\n'
'- if CrawlerProcessJob fails before recording last '
'process time '
'("Cache::put($processCrawler->lastCrawlerProcessTimeCacheKey(), '
'now());") the status will never upate\r\n'
'- importing failed Crawler pages still marked '
'success\r\n'
'- if CrawlerFilesJob fails CrawlerProcess status wont '
'update\r\n'
'- if CrawlerPrepareKnowledgebaseTrainingJob fails '
'import status wont update\r\n'
'- CrawlerFilesProcessTrainingJob@handleProcessingError '
'-- failed items are marked as processed/success.\r\n'
'should be markItemAsFailed() same as in '
'CrawlerPageProcessTrainingJob?\r\n'
'\r\n'
'- Finalizing Logic Duplication\r\n'
'The completion checking and finalization logic is '
'duplicated across multiple jobs:\r\n'
'\r\n'
'CrawlerPageProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CrawlerFilesProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CheckKnowledgebaseCrawlerImportCompletion::handle\r\n'
'\r\n'
'Each has subtle differences, creating opportunities for '
'inconsistent behavior.\r\n'
'\r\n'
'- Unreliable S3 File Operations\r\n'
'File operations on S3 have minimal error handling:\r\n'
'\r\n'
'$this->filesystem->put($s3Path, $newContent);\r\n'
'return $this->filesystem->url($s3Path);\r\n'
'\r\n'
'If the S3 put operation fails silently, subsequent code '
'would continue with a URL to a non-existent file.\r\n'
'\r\n'
'- try using knowledgebase_crawler_imports table instead '
"of cache for counting since it's already "
'implemented?\r\n'
'update counts every x seconds instead of realtime '
'updates?\r\n'
'\r\n'
'- CrawlerFileProcessTrainingJob and/or '
'CrawlerPageProcessTrainingJob failure not marking '
'KnowledgebaseCrawler as fail\r\n'
'- KnowledgebaseCrawlerImport fails getting deleted '
'after'},
'score': 0.0,
'values': []}, {'id': '98c459bf-ab60-419d-8e69-9ef5eb9ff116',
'metadata': {'chunk': 0.0,
'file_name': 'crawler-issues-19MAR2025%20-%20Copy%20%285%29%20-%20Copy.txt',
'is_dict': 'no',
'text': '- if CrawlerJob fails statues will never update, import '
'status wont update\r\n'
'(add failed() method -> create CrawlerProcess with '
'failed status, record last process time??)\r\n'
'- if CrawlerProcessJob fails before recording last '
'process time '
'("Cache::put($processCrawler->lastCrawlerProcessTimeCacheKey(), '
'now());") the status will never upate\r\n'
'- importing failed Crawler pages still marked '
'success\r\n'
'- if CrawlerFilesJob fails CrawlerProcess status wont '
'update\r\n'
'- if CrawlerPrepareKnowledgebaseTrainingJob fails '
'import status wont update\r\n'
'- CrawlerFilesProcessTrainingJob@handleProcessingError '
'-- failed items are marked as processed/success.\r\n'
'should be markItemAsFailed() same as in '
'CrawlerPageProcessTrainingJob?\r\n'
'\r\n'
'- Finalizing Logic Duplication\r\n'
'The completion checking and finalization logic is '
'duplicated across multiple jobs:\r\n'
'\r\n'
'CrawlerPageProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CrawlerFilesProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CheckKnowledgebaseCrawlerImportCompletion::handle\r\n'
'\r\n'
'Each has subtle differences, creating opportunities for '
'inconsistent behavior.\r\n'
'\r\n'
'- Unreliable S3 File Operations\r\n'
'File operations on S3 have minimal error handling:\r\n'
'\r\n'
'$this->filesystem->put($s3Path, $newContent);\r\n'
'return $this->filesystem->url($s3Path);\r\n'
'\r\n'
'If the S3 put operation fails silently, subsequent code '
'would continue with a URL to a non-existent file.\r\n'
'\r\n'
'- try using knowledgebase_crawler_imports table instead '
"of cache for counting since it's already "
'implemented?\r\n'
'update counts every x seconds instead of realtime '
'updates?\r\n'
'\r\n'
'- CrawlerFileProcessTrainingJob and/or '
'CrawlerPageProcessTrainingJob failure not marking '
'KnowledgebaseCrawler as fail\r\n'
'- KnowledgebaseCrawlerImport fails getting deleted '
'after'},
'score': 0.0,
'values': []}, {'id': 'bed67768-922b-4c7d-8165-eee3163909a0',
'metadata': {'chunk': 0.0,
'file_name': 'crawler-issues-19MAR2025%20-%20Copy%20%282%29%20-%20Copy%282%29.txt',
'is_dict': 'no',
'text': '- if CrawlerJob fails statues will never update, import '
'status wont update\r\n'
'(add failed() method -> create CrawlerProcess with '
'failed status, record last process time??)\r\n'
'- if CrawlerProcessJob fails before recording last '
'process time '
'("Cache::put($processCrawler->lastCrawlerProcessTimeCacheKey(), '
'now());") the status will never upate\r\n'
'- importing failed Crawler pages still marked '
'success\r\n'
'- if CrawlerFilesJob fails CrawlerProcess status wont '
'update\r\n'
'- if CrawlerPrepareKnowledgebaseTrainingJob fails '
'import status wont update\r\n'
'- CrawlerFilesProcessTrainingJob@handleProcessingError '
'-- failed items are marked as processed/success.\r\n'
'should be markItemAsFailed() same as in '
'CrawlerPageProcessTrainingJob?\r\n'
'\r\n'
'- Finalizing Logic Duplication\r\n'
'The completion checking and finalization logic is '
'duplicated across multiple jobs:\r\n'
'\r\n'
'CrawlerPageProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CrawlerFilesProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CheckKnowledgebaseCrawlerImportCompletion::handle\r\n'
'\r\n'
'Each has subtle differences, creating opportunities for '
'inconsistent behavior.\r\n'
'\r\n'
'- Unreliable S3 File Operations\r\n'
'File operations on S3 have minimal error handling:\r\n'
'\r\n'
'$this->filesystem->put($s3Path, $newContent);\r\n'
'return $this->filesystem->url($s3Path);\r\n'
'\r\n'
'If the S3 put operation fails silently, subsequent code '
'would continue with a URL to a non-existent file.\r\n'
'\r\n'
'- try using knowledgebase_crawler_imports table instead '
"of cache for counting since it's already "
'implemented?\r\n'
'update counts every x seconds instead of realtime '
'updates?\r\n'
'\r\n'
'- CrawlerFileProcessTrainingJob and/or '
'CrawlerPageProcessTrainingJob failure not marking '
'KnowledgebaseCrawler as fail\r\n'
'- KnowledgebaseCrawlerImport fails getting deleted '
'after'},
'score': 0.0,
'values': []}, {'id': '71442661-a788-473c-a84b-a7c9e41bae11',
'metadata': {'chunk': 0.0,
'file_name': 'crawler-issues-19MAR2025%20-%20Copy%283%29.txt',
'is_dict': 'no',
'text': '- if CrawlerJob fails statues will never update, import '
'status wont update\r\n'
'(add failed() method -> create CrawlerProcess with '
'failed status, record last process time??)\r\n'
'- if CrawlerProcessJob fails before recording last '
'process time '
'("Cache::put($processCrawler->lastCrawlerProcessTimeCacheKey(), '
'now());") the status will never upate\r\n'
'- importing failed Crawler pages still marked '
'success\r\n'
'- if CrawlerFilesJob fails CrawlerProcess status wont '
'update\r\n'
'- if CrawlerPrepareKnowledgebaseTrainingJob fails '
'import status wont update\r\n'
'- CrawlerFilesProcessTrainingJob@handleProcessingError '
'-- failed items are marked as processed/success.\r\n'
'should be markItemAsFailed() same as in '
'CrawlerPageProcessTrainingJob?\r\n'
'\r\n'
'- Finalizing Logic Duplication\r\n'
'The completion checking and finalization logic is '
'duplicated across multiple jobs:\r\n'
'\r\n'
'CrawlerPageProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CrawlerFilesProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CheckKnowledgebaseCrawlerImportCompletion::handle\r\n'
'\r\n'
'Each has subtle differences, creating opportunities for '
'inconsistent behavior.\r\n'
'\r\n'
'- Unreliable S3 File Operations\r\n'
'File operations on S3 have minimal error handling:\r\n'
'\r\n'
'$this->filesystem->put($s3Path, $newContent);\r\n'
'return $this->filesystem->url($s3Path);\r\n'
'\r\n'
'If the S3 put operation fails silently, subsequent code '
'would continue with a URL to a non-existent file.\r\n'
'\r\n'
'- try using knowledgebase_crawler_imports table instead '
"of cache for counting since it's already "
'implemented?\r\n'
'update counts every x seconds instead of realtime '
'updates?\r\n'
'\r\n'
'- CrawlerFileProcessTrainingJob and/or '
'CrawlerPageProcessTrainingJob failure not marking '
'KnowledgebaseCrawler as fail\r\n'
'- KnowledgebaseCrawlerImport fails getting deleted '
'after'},
'score': 0.0,
'values': []}, {'id': '76094636-cc16-4d62-82b3-12553d03f819',
'metadata': {'chunk': 0.0,
'file_name': 'crawler-issues-19MAR2025%20-%20Copy%20%2825%29.txt',
'is_dict': 'no',
'text': '- if CrawlerJob fails statues will never update, import '
'status wont update\r\n'
'(add failed() method -> create CrawlerProcess with '
'failed status, record last process time??)\r\n'
'- if CrawlerProcessJob fails before recording last '
'process time '
'("Cache::put($processCrawler->lastCrawlerProcessTimeCacheKey(), '
'now());") the status will never upate\r\n'
'- importing failed Crawler pages still marked '
'success\r\n'
'- if CrawlerFilesJob fails CrawlerProcess status wont '
'update\r\n'
'- if CrawlerPrepareKnowledgebaseTrainingJob fails '
'import status wont update\r\n'
'- CrawlerFilesProcessTrainingJob@handleProcessingError '
'-- failed items are marked as processed/success.\r\n'
'should be markItemAsFailed() same as in '
'CrawlerPageProcessTrainingJob?\r\n'
'\r\n'
'- Finalizing Logic Duplication\r\n'
'The completion checking and finalization logic is '
'duplicated across multiple jobs:\r\n'
'\r\n'
'CrawlerPageProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CrawlerFilesProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CheckKnowledgebaseCrawlerImportCompletion::handle\r\n'
'\r\n'
'Each has subtle differences, creating opportunities for '
'inconsistent behavior.\r\n'
'\r\n'
'- Unreliable S3 File Operations\r\n'
'File operations on S3 have minimal error handling:\r\n'
'\r\n'
'$this->filesystem->put($s3Path, $newContent);\r\n'
'return $this->filesystem->url($s3Path);\r\n'
'\r\n'
'If the S3 put operation fails silently, subsequent code '
'would continue with a URL to a non-existent file.\r\n'
'\r\n'
'- try using knowledgebase_crawler_imports table instead '
"of cache for counting since it's already "
'implemented?\r\n'
'update counts every x seconds instead of realtime '
'updates?\r\n'
'\r\n'
'- CrawlerFileProcessTrainingJob and/or '
'CrawlerPageProcessTrainingJob failure not marking '
'KnowledgebaseCrawler as fail\r\n'
'- KnowledgebaseCrawlerImport fails getting deleted '
'after'},
'score': 0.0,
'values': []}, {'id': 'b93ce251-56e3-4c7c-92e8-b9a2d1710560',
'metadata': {'chunk': 0.0,
'file_name': 'crawler-issues-19MAR2025%20-%20Copy%20%2826%29.txt',
'is_dict': 'no',
'text': '- if CrawlerJob fails statues will never update, import '
'status wont update\r\n'
'(add failed() method -> create CrawlerProcess with '
'failed status, record last process time??)\r\n'
'- if CrawlerProcessJob fails before recording last '
'process time '
'("Cache::put($processCrawler->lastCrawlerProcessTimeCacheKey(), '
'now());") the status will never upate\r\n'
'- importing failed Crawler pages still marked '
'success\r\n'
'- if CrawlerFilesJob fails CrawlerProcess status wont '
'update\r\n'
'- if CrawlerPrepareKnowledgebaseTrainingJob fails '
'import status wont update\r\n'
'- CrawlerFilesProcessTrainingJob@handleProcessingError '
'-- failed items are marked as processed/success.\r\n'
'should be markItemAsFailed() same as in '
'CrawlerPageProcessTrainingJob?\r\n'
'\r\n'
'- Finalizing Logic Duplication\r\n'
'The completion checking and finalization logic is '
'duplicated across multiple jobs:\r\n'
'\r\n'
'CrawlerPageProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CrawlerFilesProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CheckKnowledgebaseCrawlerImportCompletion::handle\r\n'
'\r\n'
'Each has subtle differences, creating opportunities for '
'inconsistent behavior.\r\n'
'\r\n'
'- Unreliable S3 File Operations\r\n'
'File operations on S3 have minimal error handling:\r\n'
'\r\n'
'$this->filesystem->put($s3Path, $newContent);\r\n'
'return $this->filesystem->url($s3Path);\r\n'
'\r\n'
'If the S3 put operation fails silently, subsequent code '
'would continue with a URL to a non-existent file.\r\n'
'\r\n'
'- try using knowledgebase_crawler_imports table instead '
"of cache for counting since it's already "
'implemented?\r\n'
'update counts every x seconds instead of realtime '
'updates?\r\n'
'\r\n'
'- CrawlerFileProcessTrainingJob and/or '
'CrawlerPageProcessTrainingJob failure not marking '
'KnowledgebaseCrawler as fail\r\n'
'- KnowledgebaseCrawlerImport fails getting deleted '
'after'},
'score': 0.0,
'values': []}, {'id': 'b092b516-4e4c-4dba-80ab-f0e487b57742',
'metadata': {'chunk': 0.0,
'file_name': 'crawler-issues-19MAR2025%20-%20Copy%20%282%29.txt',
'is_dict': 'no',
'text': '- if CrawlerJob fails statues will never update, import '
'status wont update\r\n'
'(add failed() method -> create CrawlerProcess with '
'failed status, record last process time??)\r\n'
'- if CrawlerProcessJob fails before recording last '
'process time '
'("Cache::put($processCrawler->lastCrawlerProcessTimeCacheKey(), '
'now());") the status will never upate\r\n'
'- importing failed Crawler pages still marked '
'success\r\n'
'- if CrawlerFilesJob fails CrawlerProcess status wont '
'update\r\n'
'- if CrawlerPrepareKnowledgebaseTrainingJob fails '
'import status wont update\r\n'
'- CrawlerFilesProcessTrainingJob@handleProcessingError '
'-- failed items are marked as processed/success.\r\n'
'should be markItemAsFailed() same as in '
'CrawlerPageProcessTrainingJob?\r\n'
'\r\n'
'- Finalizing Logic Duplication\r\n'
'The completion checking and finalization logic is '
'duplicated across multiple jobs:\r\n'
'\r\n'
'CrawlerPageProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CrawlerFilesProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CheckKnowledgebaseCrawlerImportCompletion::handle\r\n'
'\r\n'
'Each has subtle differences, creating opportunities for '
'inconsistent behavior.\r\n'
'\r\n'
'- Unreliable S3 File Operations\r\n'
'File operations on S3 have minimal error handling:\r\n'
'\r\n'
'$this->filesystem->put($s3Path, $newContent);\r\n'
'return $this->filesystem->url($s3Path);\r\n'
'\r\n'
'If the S3 put operation fails silently, subsequent code '
'would continue with a URL to a non-existent file.\r\n'
'\r\n'
'- try using knowledgebase_crawler_imports table instead '
"of cache for counting since it's already "
'implemented?\r\n'
'update counts every x seconds instead of realtime '
'updates?\r\n'
'\r\n'
'- CrawlerFileProcessTrainingJob and/or '
'CrawlerPageProcessTrainingJob failure not marking '
'KnowledgebaseCrawler as fail\r\n'
'- KnowledgebaseCrawlerImport fails getting deleted '
'after'},
'score': 0.0,
'values': []}, {'id': '4610d233-61c0-4f5c-87e6-2a30dae73ed7',
'metadata': {'chunk': 0.0,
'file_name': 'crawler-issues-19MAR2025%20-%20Copy%20%282%29%20-%20Copy.txt',
'is_dict': 'no',
'text': '- if CrawlerJob fails statues will never update, import '
'status wont update\r\n'
'(add failed() method -> create CrawlerProcess with '
'failed status, record last process time??)\r\n'
'- if CrawlerProcessJob fails before recording last '
'process time '
'("Cache::put($processCrawler->lastCrawlerProcessTimeCacheKey(), '
'now());") the status will never upate\r\n'
'- importing failed Crawler pages still marked '
'success\r\n'
'- if CrawlerFilesJob fails CrawlerProcess status wont '
'update\r\n'
'- if CrawlerPrepareKnowledgebaseTrainingJob fails '
'import status wont update\r\n'
'- CrawlerFilesProcessTrainingJob@handleProcessingError '
'-- failed items are marked as processed/success.\r\n'
'should be markItemAsFailed() same as in '
'CrawlerPageProcessTrainingJob?\r\n'
'\r\n'
'- Finalizing Logic Duplication\r\n'
'The completion checking and finalization logic is '
'duplicated across multiple jobs:\r\n'
'\r\n'
'CrawlerPageProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CrawlerFilesProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CheckKnowledgebaseCrawlerImportCompletion::handle\r\n'
'\r\n'
'Each has subtle differences, creating opportunities for '
'inconsistent behavior.\r\n'
'\r\n'
'- Unreliable S3 File Operations\r\n'
'File operations on S3 have minimal error handling:\r\n'
'\r\n'
'$this->filesystem->put($s3Path, $newContent);\r\n'
'return $this->filesystem->url($s3Path);\r\n'
'\r\n'
'If the S3 put operation fails silently, subsequent code '
'would continue with a URL to a non-existent file.\r\n'
'\r\n'
'- try using knowledgebase_crawler_imports table instead '
"of cache for counting since it's already "
'implemented?\r\n'
'update counts every x seconds instead of realtime '
'updates?\r\n'
'\r\n'
'- CrawlerFileProcessTrainingJob and/or '
'CrawlerPageProcessTrainingJob failure not marking '
'KnowledgebaseCrawler as fail\r\n'
'- KnowledgebaseCrawlerImport fails getting deleted '
'after'},
'score': 0.0,
'values': []}, {'id': '517e3caa-34e0-4e9d-a7f4-6fc6b2feb700',
'metadata': {'chunk': 0.0,
'file_name': 'crawler-issues-19MAR2025%20-%20Copy%20%283%29.txt',
'is_dict': 'no',
'text': '- if CrawlerJob fails statues will never update, import '
'status wont update\r\n'
'(add failed() method -> create CrawlerProcess with '
'failed status, record last process time??)\r\n'
'- if CrawlerProcessJob fails before recording last '
'process time '
'("Cache::put($processCrawler->lastCrawlerProcessTimeCacheKey(), '
'now());") the status will never upate\r\n'
'- importing failed Crawler pages still marked '
'success\r\n'
'- if CrawlerFilesJob fails CrawlerProcess status wont '
'update\r\n'
'- if CrawlerPrepareKnowledgebaseTrainingJob fails '
'import status wont update\r\n'
'- CrawlerFilesProcessTrainingJob@handleProcessingError '
'-- failed items are marked as processed/success.\r\n'
'should be markItemAsFailed() same as in '
'CrawlerPageProcessTrainingJob?\r\n'
'\r\n'
'- Finalizing Logic Duplication\r\n'
'The completion checking and finalization logic is '
'duplicated across multiple jobs:\r\n'
'\r\n'
'CrawlerPageProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CrawlerFilesProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CheckKnowledgebaseCrawlerImportCompletion::handle\r\n'
'\r\n'
'Each has subtle differences, creating opportunities for '
'inconsistent behavior.\r\n'
'\r\n'
'- Unreliable S3 File Operations\r\n'
'File operations on S3 have minimal error handling:\r\n'
'\r\n'
'$this->filesystem->put($s3Path, $newContent);\r\n'
'return $this->filesystem->url($s3Path);\r\n'
'\r\n'
'If the S3 put operation fails silently, subsequent code '
'would continue with a URL to a non-existent file.\r\n'
'\r\n'
'- try using knowledgebase_crawler_imports table instead '
"of cache for counting since it's already "
'implemented?\r\n'
'update counts every x seconds instead of realtime '
'updates?\r\n'
'\r\n'
'- CrawlerFileProcessTrainingJob and/or '
'CrawlerPageProcessTrainingJob failure not marking '
'KnowledgebaseCrawler as fail\r\n'
'- KnowledgebaseCrawlerImport fails getting deleted '
'after'},
'score': 0.0,
'values': []}, {'id': 'a0f6eb82-8f33-4811-8c3a-645f729b6c4d',
'metadata': {'chunk': 0.0,
'file_name': 'crawler-issues-19MAR2025%20-%20Copy%20%285%29.txt',
'is_dict': 'no',
'text': '- if CrawlerJob fails statues will never update, import '
'status wont update\r\n'
'(add failed() method -> create CrawlerProcess with '
'failed status, record last process time??)\r\n'
'- if CrawlerProcessJob fails before recording last '
'process time '
'("Cache::put($processCrawler->lastCrawlerProcessTimeCacheKey(), '
'now());") the status will never upate\r\n'
'- importing failed Crawler pages still marked '
'success\r\n'
'- if CrawlerFilesJob fails CrawlerProcess status wont '
'update\r\n'
'- if CrawlerPrepareKnowledgebaseTrainingJob fails '
'import status wont update\r\n'
'- CrawlerFilesProcessTrainingJob@handleProcessingError '
'-- failed items are marked as processed/success.\r\n'
'should be markItemAsFailed() same as in '
'CrawlerPageProcessTrainingJob?\r\n'
'\r\n'
'- Finalizing Logic Duplication\r\n'
'The completion checking and finalization logic is '
'duplicated across multiple jobs:\r\n'
'\r\n'
'CrawlerPageProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CrawlerFilesProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CheckKnowledgebaseCrawlerImportCompletion::handle\r\n'
'\r\n'
'Each has subtle differences, creating opportunities for '
'inconsistent behavior.\r\n'
'\r\n'
'- Unreliable S3 File Operations\r\n'
'File operations on S3 have minimal error handling:\r\n'
'\r\n'
'$this->filesystem->put($s3Path, $newContent);\r\n'
'return $this->filesystem->url($s3Path);\r\n'
'\r\n'
'If the S3 put operation fails silently, subsequent code '
'would continue with a URL to a non-existent file.\r\n'
'\r\n'
'- try using knowledgebase_crawler_imports table instead '
"of cache for counting since it's already "
'implemented?\r\n'
'update counts every x seconds instead of realtime '
'updates?\r\n'
'\r\n'
'- CrawlerFileProcessTrainingJob and/or '
'CrawlerPageProcessTrainingJob failure not marking '
'KnowledgebaseCrawler as fail\r\n'
'- KnowledgebaseCrawlerImport fails getting deleted '
'after'},
'score': 0.0,
'values': []}, {'id': 'ba0f3419-15bb-414a-99cc-d760354dcd52',
'metadata': {'chunk': 0.0,
'file_name': 'crawler-issues-19MAR2025%20-%20Copy%20%286%29.txt',
'is_dict': 'no',
'text': '- if CrawlerJob fails statues will never update, import '
'status wont update\r\n'
'(add failed() method -> create CrawlerProcess with '
'failed status, record last process time??)\r\n'
'- if CrawlerProcessJob fails before recording last '
'process time '
'("Cache::put($processCrawler->lastCrawlerProcessTimeCacheKey(), '
'now());") the status will never upate\r\n'
'- importing failed Crawler pages still marked '
'success\r\n'
'- if CrawlerFilesJob fails CrawlerProcess status wont '
'update\r\n'
'- if CrawlerPrepareKnowledgebaseTrainingJob fails '
'import status wont update\r\n'
'- CrawlerFilesProcessTrainingJob@handleProcessingError '
'-- failed items are marked as processed/success.\r\n'
'should be markItemAsFailed() same as in '
'CrawlerPageProcessTrainingJob?\r\n'
'\r\n'
'- Finalizing Logic Duplication\r\n'
'The completion checking and finalization logic is '
'duplicated across multiple jobs:\r\n'
'\r\n'
'CrawlerPageProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CrawlerFilesProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CheckKnowledgebaseCrawlerImportCompletion::handle\r\n'
'\r\n'
'Each has subtle differences, creating opportunities for '
'inconsistent behavior.\r\n'
'\r\n'
'- Unreliable S3 File Operations\r\n'
'File operations on S3 have minimal error handling:\r\n'
'\r\n'
'$this->filesystem->put($s3Path, $newContent);\r\n'
'return $this->filesystem->url($s3Path);\r\n'
'\r\n'
'If the S3 put operation fails silently, subsequent code '
'would continue with a URL to a non-existent file.\r\n'
'\r\n'
'- try using knowledgebase_crawler_imports table instead '
"of cache for counting since it's already "
'implemented?\r\n'
'update counts every x seconds instead of realtime '
'updates?\r\n'
'\r\n'
'- CrawlerFileProcessTrainingJob and/or '
'CrawlerPageProcessTrainingJob failure not marking '
'KnowledgebaseCrawler as fail\r\n'
'- KnowledgebaseCrawlerImport fails getting deleted '
'after'},
'score': 0.0,
'values': []}, {'id': '9bffa128-c3b9-4a3d-a3eb-4dcbd0bbdda5',
'metadata': {'chunk': 0.0,
'file_name': 'crawler-issues-19MAR2025%20-%20Copy%20%282%29%20-%20Copy%281%29.txt',
'is_dict': 'no',
'text': '- if CrawlerJob fails statues will never update, import '
'status wont update\r\n'
'(add failed() method -> create CrawlerProcess with '
'failed status, record last process time??)\r\n'
'- if CrawlerProcessJob fails before recording last '
'process time '
'("Cache::put($processCrawler->lastCrawlerProcessTimeCacheKey(), '
'now());") the status will never upate\r\n'
'- importing failed Crawler pages still marked '
'success\r\n'
'- if CrawlerFilesJob fails CrawlerProcess status wont '
'update\r\n'
'- if CrawlerPrepareKnowledgebaseTrainingJob fails '
'import status wont update\r\n'
'- CrawlerFilesProcessTrainingJob@handleProcessingError '
'-- failed items are marked as processed/success.\r\n'
'should be markItemAsFailed() same as in '
'CrawlerPageProcessTrainingJob?\r\n'
'\r\n'
'- Finalizing Logic Duplication\r\n'
'The completion checking and finalization logic is '
'duplicated across multiple jobs:\r\n'
'\r\n'
'CrawlerPageProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CrawlerFilesProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CheckKnowledgebaseCrawlerImportCompletion::handle\r\n'
'\r\n'
'Each has subtle differences, creating opportunities for '
'inconsistent behavior.\r\n'
'\r\n'
'- Unreliable S3 File Operations\r\n'
'File operations on S3 have minimal error handling:\r\n'
'\r\n'
'$this->filesystem->put($s3Path, $newContent);\r\n'
'return $this->filesystem->url($s3Path);\r\n'
'\r\n'
'If the S3 put operation fails silently, subsequent code '
'would continue with a URL to a non-existent file.\r\n'
'\r\n'
'- try using knowledgebase_crawler_imports table instead '
"of cache for counting since it's already "
'implemented?\r\n'
'update counts every x seconds instead of realtime '
'updates?\r\n'
'\r\n'
'- CrawlerFileProcessTrainingJob and/or '
'CrawlerPageProcessTrainingJob failure not marking '
'KnowledgebaseCrawler as fail\r\n'
'- KnowledgebaseCrawlerImport fails getting deleted '
'after'},
'score': 0.0,
'values': []}, {'id': '67323980-5aca-46e6-80f7-7f19801ac759',
'metadata': {'chunk': 0.0,
'file_name': 'crawler-issues-19MAR2025%20-%20Copy%20%284%29.txt',
'is_dict': 'no',
'text': '- if CrawlerJob fails statues will never update, import '
'status wont update\r\n'
'(add failed() method -> create CrawlerProcess with '
'failed status, record last process time??)\r\n'
'- if CrawlerProcessJob fails before recording last '
'process time '
'("Cache::put($processCrawler->lastCrawlerProcessTimeCacheKey(), '
'now());") the status will never upate\r\n'
'- importing failed Crawler pages still marked '
'success\r\n'
'- if CrawlerFilesJob fails CrawlerProcess status wont '
'update\r\n'
'- if CrawlerPrepareKnowledgebaseTrainingJob fails '
'import status wont update\r\n'
'- CrawlerFilesProcessTrainingJob@handleProcessingError '
'-- failed items are marked as processed/success.\r\n'
'should be markItemAsFailed() same as in '
'CrawlerPageProcessTrainingJob?\r\n'
'\r\n'
'- Finalizing Logic Duplication\r\n'
'The completion checking and finalization logic is '
'duplicated across multiple jobs:\r\n'
'\r\n'
'CrawlerPageProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CrawlerFilesProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CheckKnowledgebaseCrawlerImportCompletion::handle\r\n'
'\r\n'
'Each has subtle differences, creating opportunities for '
'inconsistent behavior.\r\n'
'\r\n'
'- Unreliable S3 File Operations\r\n'
'File operations on S3 have minimal error handling:\r\n'
'\r\n'
'$this->filesystem->put($s3Path, $newContent);\r\n'
'return $this->filesystem->url($s3Path);\r\n'
'\r\n'
'If the S3 put operation fails silently, subsequent code '
'would continue with a URL to a non-existent file.\r\n'
'\r\n'
'- try using knowledgebase_crawler_imports table instead '
"of cache for counting since it's already "
'implemented?\r\n'
'update counts every x seconds instead of realtime '
'updates?\r\n'
'\r\n'
'- CrawlerFileProcessTrainingJob and/or '
'CrawlerPageProcessTrainingJob failure not marking '
'KnowledgebaseCrawler as fail\r\n'
'- KnowledgebaseCrawlerImport fails getting deleted '
'after'},
'score': 0.0,
'values': []}, {'id': '98bb48d5-27e0-499b-959f-247b6a74aa67',
'metadata': {'chunk': 0.0,
'file_name': 'crawler-issues-19MAR2025%20-%20Copy%20%287%29.txt',
'is_dict': 'no',
'text': '- if CrawlerJob fails statues will never update, import '
'status wont update\r\n'
'(add failed() method -> create CrawlerProcess with '
'failed status, record last process time??)\r\n'
'- if CrawlerProcessJob fails before recording last '
'process time '
'("Cache::put($processCrawler->lastCrawlerProcessTimeCacheKey(), '
'now());") the status will never upate\r\n'
'- importing failed Crawler pages still marked '
'success\r\n'
'- if CrawlerFilesJob fails CrawlerProcess status wont '
'update\r\n'
'- if CrawlerPrepareKnowledgebaseTrainingJob fails '
'import status wont update\r\n'
'- CrawlerFilesProcessTrainingJob@handleProcessingError '
'-- failed items are marked as processed/success.\r\n'
'should be markItemAsFailed() same as in '
'CrawlerPageProcessTrainingJob?\r\n'
'\r\n'
'- Finalizing Logic Duplication\r\n'
'The completion checking and finalization logic is '
'duplicated across multiple jobs:\r\n'
'\r\n'
'CrawlerPageProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CrawlerFilesProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CheckKnowledgebaseCrawlerImportCompletion::handle\r\n'
'\r\n'
'Each has subtle differences, creating opportunities for '
'inconsistent behavior.\r\n'
'\r\n'
'- Unreliable S3 File Operations\r\n'
'File operations on S3 have minimal error handling:\r\n'
'\r\n'
'$this->filesystem->put($s3Path, $newContent);\r\n'
'return $this->filesystem->url($s3Path);\r\n'
'\r\n'
'If the S3 put operation fails silently, subsequent code '
'would continue with a URL to a non-existent file.\r\n'
'\r\n'
'- try using knowledgebase_crawler_imports table instead '
"of cache for counting since it's already "
'implemented?\r\n'
'update counts every x seconds instead of realtime '
'updates?\r\n'
'\r\n'
'- CrawlerFileProcessTrainingJob and/or '
'CrawlerPageProcessTrainingJob failure not marking '
'KnowledgebaseCrawler as fail\r\n'
'- KnowledgebaseCrawlerImport fails getting deleted '
'after'},
'score': 0.0,
'values': []}, {'id': '74b81977-5f8f-4689-97fe-0e4ef9a7ee67',
'metadata': {'chunk': 0.0,
'file_name': 'crawler-issues-19MAR2025%20-%20Copy%20%2814%29.txt',
'is_dict': 'no',
'text': '- if CrawlerJob fails statues will never update, import '
'status wont update\r\n'
'(add failed() method -> create CrawlerProcess with '
'failed status, record last process time??)\r\n'
'- if CrawlerProcessJob fails before recording last '
'process time '
'("Cache::put($processCrawler->lastCrawlerProcessTimeCacheKey(), '
'now());") the status will never upate\r\n'
'- importing failed Crawler pages still marked '
'success\r\n'
'- if CrawlerFilesJob fails CrawlerProcess status wont '
'update\r\n'
'- if CrawlerPrepareKnowledgebaseTrainingJob fails '
'import status wont update\r\n'
'- CrawlerFilesProcessTrainingJob@handleProcessingError '
'-- failed items are marked as processed/success.\r\n'
'should be markItemAsFailed() same as in '
'CrawlerPageProcessTrainingJob?\r\n'
'\r\n'
'- Finalizing Logic Duplication\r\n'
'The completion checking and finalization logic is '
'duplicated across multiple jobs:\r\n'
'\r\n'
'CrawlerPageProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CrawlerFilesProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CheckKnowledgebaseCrawlerImportCompletion::handle\r\n'
'\r\n'
'Each has subtle differences, creating opportunities for '
'inconsistent behavior.\r\n'
'\r\n'
'- Unreliable S3 File Operations\r\n'
'File operations on S3 have minimal error handling:\r\n'
'\r\n'
'$this->filesystem->put($s3Path, $newContent);\r\n'
'return $this->filesystem->url($s3Path);\r\n'
'\r\n'
'If the S3 put operation fails silently, subsequent code '
'would continue with a URL to a non-existent file.\r\n'
'\r\n'
'- try using knowledgebase_crawler_imports table instead '
"of cache for counting since it's already "
'implemented?\r\n'
'update counts every x seconds instead of realtime '
'updates?\r\n'
'\r\n'
'- CrawlerFileProcessTrainingJob and/or '
'CrawlerPageProcessTrainingJob failure not marking '
'KnowledgebaseCrawler as fail\r\n'
'- KnowledgebaseCrawlerImport fails getting deleted '
'after'},
'score': 0.0,
'values': []}, {'id': 'e8b50778-45cb-418f-9cc2-20639bb4a8c9',
'metadata': {'chunk': 0.0,
'file_name': 'crawler-issues-19MAR2025%20-%20Copy%20%2813%29.txt',
'is_dict': 'no',
'text': '- if CrawlerJob fails statues will never update, import '
'status wont update\r\n'
'(add failed() method -> create CrawlerProcess with '
'failed status, record last process time??)\r\n'
'- if CrawlerProcessJob fails before recording last '
'process time '
'("Cache::put($processCrawler->lastCrawlerProcessTimeCacheKey(), '
'now());") the status will never upate\r\n'
'- importing failed Crawler pages still marked '
'success\r\n'
'- if CrawlerFilesJob fails CrawlerProcess status wont '
'update\r\n'
'- if CrawlerPrepareKnowledgebaseTrainingJob fails '
'import status wont update\r\n'
'- CrawlerFilesProcessTrainingJob@handleProcessingError '
'-- failed items are marked as processed/success.\r\n'
'should be markItemAsFailed() same as in '
'CrawlerPageProcessTrainingJob?\r\n'
'\r\n'
'- Finalizing Logic Duplication\r\n'
'The completion checking and finalization logic is '
'duplicated across multiple jobs:\r\n'
'\r\n'
'CrawlerPageProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CrawlerFilesProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CheckKnowledgebaseCrawlerImportCompletion::handle\r\n'
'\r\n'
'Each has subtle differences, creating opportunities for '
'inconsistent behavior.\r\n'
'\r\n'
'- Unreliable S3 File Operations\r\n'
'File operations on S3 have minimal error handling:\r\n'
'\r\n'
'$this->filesystem->put($s3Path, $newContent);\r\n'
'return $this->filesystem->url($s3Path);\r\n'
'\r\n'
'If the S3 put operation fails silently, subsequent code '
'would continue with a URL to a non-existent file.\r\n'
'\r\n'
'- try using knowledgebase_crawler_imports table instead '
"of cache for counting since it's already "
'implemented?\r\n'
'update counts every x seconds instead of realtime '
'updates?\r\n'
'\r\n'
'- CrawlerFileProcessTrainingJob and/or '
'CrawlerPageProcessTrainingJob failure not marking '
'KnowledgebaseCrawler as fail\r\n'
'- KnowledgebaseCrawlerImport fails getting deleted '
'after'},
'score': 0.0,
'values': []}, {'id': '48e5deca-e635-43dc-85cc-260ef35d3b60',
'metadata': {'chunk': 0.0,
'file_name': 'crawler-issues-19MAR2025%20-%20Copy%20%2812%29.txt',
'is_dict': 'no',
'text': '- if CrawlerJob fails statues will never update, import '
'status wont update\r\n'
'(add failed() method -> create CrawlerProcess with '
'failed status, record last process time??)\r\n'
'- if CrawlerProcessJob fails before recording last '
'process time '
'("Cache::put($processCrawler->lastCrawlerProcessTimeCacheKey(), '
'now());") the status will never upate\r\n'
'- importing failed Crawler pages still marked '
'success\r\n'
'- if CrawlerFilesJob fails CrawlerProcess status wont '
'update\r\n'
'- if CrawlerPrepareKnowledgebaseTrainingJob fails '
'import status wont update\r\n'
'- CrawlerFilesProcessTrainingJob@handleProcessingError '
'-- failed items are marked as processed/success.\r\n'
'should be markItemAsFailed() same as in '
'CrawlerPageProcessTrainingJob?\r\n'
'\r\n'
'- Finalizing Logic Duplication\r\n'
'The completion checking and finalization logic is '
'duplicated across multiple jobs:\r\n'
'\r\n'
'CrawlerPageProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CrawlerFilesProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CheckKnowledgebaseCrawlerImportCompletion::handle\r\n'
'\r\n'
'Each has subtle differences, creating opportunities for '
'inconsistent behavior.\r\n'
'\r\n'
'- Unreliable S3 File Operations\r\n'
'File operations on S3 have minimal error handling:\r\n'
'\r\n'
'$this->filesystem->put($s3Path, $newContent);\r\n'
'return $this->filesystem->url($s3Path);\r\n'
'\r\n'
'If the S3 put operation fails silently, subsequent code '
'would continue with a URL to a non-existent file.\r\n'
'\r\n'
'- try using knowledgebase_crawler_imports table instead '
"of cache for counting since it's already "
'implemented?\r\n'
'update counts every x seconds instead of realtime '
'updates?\r\n'
'\r\n'
'- CrawlerFileProcessTrainingJob and/or '
'CrawlerPageProcessTrainingJob failure not marking '
'KnowledgebaseCrawler as fail\r\n'
'- KnowledgebaseCrawlerImport fails getting deleted '
'after'},
'score': 0.0,
'values': []}, {'id': '678d21ac-21ab-499b-8576-4244b9873985',
'metadata': {'chunk': 0.0,
'file_name': 'crawler-issues-19MAR2025%20-%20Copy%20%288%29.txt',
'is_dict': 'no',
'text': '- if CrawlerJob fails statues will never update, import '
'status wont update\r\n'
'(add failed() method -> create CrawlerProcess with '
'failed status, record last process time??)\r\n'
'- if CrawlerProcessJob fails before recording last '
'process time '
'("Cache::put($processCrawler->lastCrawlerProcessTimeCacheKey(), '
'now());") the status will never upate\r\n'
'- importing failed Crawler pages still marked '
'success\r\n'
'- if CrawlerFilesJob fails CrawlerProcess status wont '
'update\r\n'
'- if CrawlerPrepareKnowledgebaseTrainingJob fails '
'import status wont update\r\n'
'- CrawlerFilesProcessTrainingJob@handleProcessingError '
'-- failed items are marked as processed/success.\r\n'
'should be markItemAsFailed() same as in '
'CrawlerPageProcessTrainingJob?\r\n'
'\r\n'
'- Finalizing Logic Duplication\r\n'
'The completion checking and finalization logic is '
'duplicated across multiple jobs:\r\n'
'\r\n'
'CrawlerPageProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CrawlerFilesProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CheckKnowledgebaseCrawlerImportCompletion::handle\r\n'
'\r\n'
'Each has subtle differences, creating opportunities for '
'inconsistent behavior.\r\n'
'\r\n'
'- Unreliable S3 File Operations\r\n'
'File operations on S3 have minimal error handling:\r\n'
'\r\n'
'$this->filesystem->put($s3Path, $newContent);\r\n'
'return $this->filesystem->url($s3Path);\r\n'
'\r\n'
'If the S3 put operation fails silently, subsequent code '
'would continue with a URL to a non-existent file.\r\n'
'\r\n'
'- try using knowledgebase_crawler_imports table instead '
"of cache for counting since it's already "
'implemented?\r\n'
'update counts every x seconds instead of realtime '
'updates?\r\n'
'\r\n'
'- CrawlerFileProcessTrainingJob and/or '
'CrawlerPageProcessTrainingJob failure not marking '
'KnowledgebaseCrawler as fail\r\n'
'- KnowledgebaseCrawlerImport fails getting deleted '
'after'},
'score': 0.0,
'values': []}, {'id': 'fc331940-372e-440c-89f9-9915877e14eb',
'metadata': {'chunk': 0.0,
'file_name': 'crawler-issues-19MAR2025%20-%20Copy%20%289%29.txt',
'is_dict': 'no',
'text': '- if CrawlerJob fails statues will never update, import '
'status wont update\r\n'
'(add failed() method -> create CrawlerProcess with '
'failed status, record last process time??)\r\n'
'- if CrawlerProcessJob fails before recording last '
'process time '
'("Cache::put($processCrawler->lastCrawlerProcessTimeCacheKey(), '
'now());") the status will never upate\r\n'
'- importing failed Crawler pages still marked '
'success\r\n'
'- if CrawlerFilesJob fails CrawlerProcess status wont '
'update\r\n'
'- if CrawlerPrepareKnowledgebaseTrainingJob fails '
'import status wont update\r\n'
'- CrawlerFilesProcessTrainingJob@handleProcessingError '
'-- failed items are marked as processed/success.\r\n'
'should be markItemAsFailed() same as in '
'CrawlerPageProcessTrainingJob?\r\n'
'\r\n'
'- Finalizing Logic Duplication\r\n'
'The completion checking and finalization logic is '
'duplicated across multiple jobs:\r\n'
'\r\n'
'CrawlerPageProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CrawlerFilesProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CheckKnowledgebaseCrawlerImportCompletion::handle\r\n'
'\r\n'
'Each has subtle differences, creating opportunities for '
'inconsistent behavior.\r\n'
'\r\n'
'- Unreliable S3 File Operations\r\n'
'File operations on S3 have minimal error handling:\r\n'
'\r\n'
'$this->filesystem->put($s3Path, $newContent);\r\n'
'return $this->filesystem->url($s3Path);\r\n'
'\r\n'
'If the S3 put operation fails silently, subsequent code '
'would continue with a URL to a non-existent file.\r\n'
'\r\n'
'- try using knowledgebase_crawler_imports table instead '
"of cache for counting since it's already "
'implemented?\r\n'
'update counts every x seconds instead of realtime '
'updates?\r\n'
'\r\n'
'- CrawlerFileProcessTrainingJob and/or '
'CrawlerPageProcessTrainingJob failure not marking '
'KnowledgebaseCrawler as fail\r\n'
'- KnowledgebaseCrawlerImport fails getting deleted '
'after'},
'score': 0.0,
'values': []}, {'id': 'c592eebd-3f8f-415e-b1e0-e4d30770e85b',
'metadata': {'chunk': 0.0,
'file_name': 'crawler-issues-19MAR2025%20-%20Copy%20%2810%29.txt',
'is_dict': 'no',
'text': '- if CrawlerJob fails statues will never update, import '
'status wont update\r\n'
'(add failed() method -> create CrawlerProcess with '
'failed status, record last process time??)\r\n'
'- if CrawlerProcessJob fails before recording last '
'process time '
'("Cache::put($processCrawler->lastCrawlerProcessTimeCacheKey(), '
'now());") the status will never upate\r\n'
'- importing failed Crawler pages still marked '
'success\r\n'
'- if CrawlerFilesJob fails CrawlerProcess status wont '
'update\r\n'
'- if CrawlerPrepareKnowledgebaseTrainingJob fails '
'import status wont update\r\n'
'- CrawlerFilesProcessTrainingJob@handleProcessingError '
'-- failed items are marked as processed/success.\r\n'
'should be markItemAsFailed() same as in '
'CrawlerPageProcessTrainingJob?\r\n'
'\r\n'
'- Finalizing Logic Duplication\r\n'
'The completion checking and finalization logic is '
'duplicated across multiple jobs:\r\n'
'\r\n'
'CrawlerPageProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CrawlerFilesProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CheckKnowledgebaseCrawlerImportCompletion::handle\r\n'
'\r\n'
'Each has subtle differences, creating opportunities for '
'inconsistent behavior.\r\n'
'\r\n'
'- Unreliable S3 File Operations\r\n'
'File operations on S3 have minimal error handling:\r\n'
'\r\n'
'$this->filesystem->put($s3Path, $newContent);\r\n'
'return $this->filesystem->url($s3Path);\r\n'
'\r\n'
'If the S3 put operation fails silently, subsequent code '
'would continue with a URL to a non-existent file.\r\n'
'\r\n'
'- try using knowledgebase_crawler_imports table instead '
"of cache for counting since it's already "
'implemented?\r\n'
'update counts every x seconds instead of realtime '
'updates?\r\n'
'\r\n'
'- CrawlerFileProcessTrainingJob and/or '
'CrawlerPageProcessTrainingJob failure not marking '
'KnowledgebaseCrawler as fail\r\n'
'- KnowledgebaseCrawlerImport fails getting deleted '
'after'},
'score': 0.0,
'values': []}, {'id': '6d964ff6-217e-4614-8184-2d51ed4e8ae9',
'metadata': {'chunk': 0.0,
'file_name': 'crawler-issues-19MAR2025%20-%20Copy%20%2811%29.txt',
'is_dict': 'no',
'text': '- if CrawlerJob fails statues will never update, import '
'status wont update\r\n'
'(add failed() method -> create CrawlerProcess with '
'failed status, record last process time??)\r\n'
'- if CrawlerProcessJob fails before recording last '
'process time '
'("Cache::put($processCrawler->lastCrawlerProcessTimeCacheKey(), '
'now());") the status will never upate\r\n'
'- importing failed Crawler pages still marked '
'success\r\n'
'- if CrawlerFilesJob fails CrawlerProcess status wont '
'update\r\n'
'- if CrawlerPrepareKnowledgebaseTrainingJob fails '
'import status wont update\r\n'
'- CrawlerFilesProcessTrainingJob@handleProcessingError '
'-- failed items are marked as processed/success.\r\n'
'should be markItemAsFailed() same as in '
'CrawlerPageProcessTrainingJob?\r\n'
'\r\n'
'- Finalizing Logic Duplication\r\n'
'The completion checking and finalization logic is '
'duplicated across multiple jobs:\r\n'
'\r\n'
'CrawlerPageProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CrawlerFilesProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CheckKnowledgebaseCrawlerImportCompletion::handle\r\n'
'\r\n'
'Each has subtle differences, creating opportunities for '
'inconsistent behavior.\r\n'
'\r\n'
'- Unreliable S3 File Operations\r\n'
'File operations on S3 have minimal error handling:\r\n'
'\r\n'
'$this->filesystem->put($s3Path, $newContent);\r\n'
'return $this->filesystem->url($s3Path);\r\n'
'\r\n'
'If the S3 put operation fails silently, subsequent code '
'would continue with a URL to a non-existent file.\r\n'
'\r\n'
'- try using knowledgebase_crawler_imports table instead '
"of cache for counting since it's already "
'implemented?\r\n'
'update counts every x seconds instead of realtime '
'updates?\r\n'
'\r\n'
'- CrawlerFileProcessTrainingJob and/or '
'CrawlerPageProcessTrainingJob failure not marking '
'KnowledgebaseCrawler as fail\r\n'
'- KnowledgebaseCrawlerImport fails getting deleted '
'after'},
'score': 0.0,
'values': []}, {'id': 'ab79ad77-cd4f-4305-a4dd-e3de020e2341',
'metadata': {'chunk': 0.0,
'file_name': 'crawler-issues-19MAR2025%20-%20Copy%20%2822%29.txt',
'is_dict': 'no',
'text': '- if CrawlerJob fails statues will never update, import '
'status wont update\r\n'
'(add failed() method -> create CrawlerProcess with '
'failed status, record last process time??)\r\n'
'- if CrawlerProcessJob fails before recording last '
'process time '
'("Cache::put($processCrawler->lastCrawlerProcessTimeCacheKey(), '
'now());") the status will never upate\r\n'
'- importing failed Crawler pages still marked '
'success\r\n'
'- if CrawlerFilesJob fails CrawlerProcess status wont '
'update\r\n'
'- if CrawlerPrepareKnowledgebaseTrainingJob fails '
'import status wont update\r\n'
'- CrawlerFilesProcessTrainingJob@handleProcessingError '
'-- failed items are marked as processed/success.\r\n'
'should be markItemAsFailed() same as in '
'CrawlerPageProcessTrainingJob?\r\n'
'\r\n'
'- Finalizing Logic Duplication\r\n'
'The completion checking and finalization logic is '
'duplicated across multiple jobs:\r\n'
'\r\n'
'CrawlerPageProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CrawlerFilesProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CheckKnowledgebaseCrawlerImportCompletion::handle\r\n'
'\r\n'
'Each has subtle differences, creating opportunities for '
'inconsistent behavior.\r\n'
'\r\n'
'- Unreliable S3 File Operations\r\n'
'File operations on S3 have minimal error handling:\r\n'
'\r\n'
'$this->filesystem->put($s3Path, $newContent);\r\n'
'return $this->filesystem->url($s3Path);\r\n'
'\r\n'
'If the S3 put operation fails silently, subsequent code '
'would continue with a URL to a non-existent file.\r\n'
'\r\n'
'- try using knowledgebase_crawler_imports table instead '
"of cache for counting since it's already "
'implemented?\r\n'
'update counts every x seconds instead of realtime '
'updates?\r\n'
'\r\n'
'- CrawlerFileProcessTrainingJob and/or '
'CrawlerPageProcessTrainingJob failure not marking '
'KnowledgebaseCrawler as fail\r\n'
'- KnowledgebaseCrawlerImport fails getting deleted '
'after'},
'score': 0.0,
'values': []}, {'id': '90f12c01-525e-458d-833c-06db6855a5f5',
'metadata': {'chunk': 0.0,
'file_name': 'crawler-issues-19MAR2025%20-%20Copy%20%2823%29.txt',
'is_dict': 'no',
'text': '- if CrawlerJob fails statues will never update, import '
'status wont update\r\n'
'(add failed() method -> create CrawlerProcess with '
'failed status, record last process time??)\r\n'
'- if CrawlerProcessJob fails before recording last '
'process time '
'("Cache::put($processCrawler->lastCrawlerProcessTimeCacheKey(), '
'now());") the status will never upate\r\n'
'- importing failed Crawler pages still marked '
'success\r\n'
'- if CrawlerFilesJob fails CrawlerProcess status wont '
'update\r\n'
'- if CrawlerPrepareKnowledgebaseTrainingJob fails '
'import status wont update\r\n'
'- CrawlerFilesProcessTrainingJob@handleProcessingError '
'-- failed items are marked as processed/success.\r\n'
'should be markItemAsFailed() same as in '
'CrawlerPageProcessTrainingJob?\r\n'
'\r\n'
'- Finalizing Logic Duplication\r\n'
'The completion checking and finalization logic is '
'duplicated across multiple jobs:\r\n'
'\r\n'
'CrawlerPageProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CrawlerFilesProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CheckKnowledgebaseCrawlerImportCompletion::handle\r\n'
'\r\n'
'Each has subtle differences, creating opportunities for '
'inconsistent behavior.\r\n'
'\r\n'
'- Unreliable S3 File Operations\r\n'
'File operations on S3 have minimal error handling:\r\n'
'\r\n'
'$this->filesystem->put($s3Path, $newContent);\r\n'
'return $this->filesystem->url($s3Path);\r\n'
'\r\n'
'If the S3 put operation fails silently, subsequent code '
'would continue with a URL to a non-existent file.\r\n'
'\r\n'
'- try using knowledgebase_crawler_imports table instead '
"of cache for counting since it's already "
'implemented?\r\n'
'update counts every x seconds instead of realtime '
'updates?\r\n'
'\r\n'
'- CrawlerFileProcessTrainingJob and/or '
'CrawlerPageProcessTrainingJob failure not marking '
'KnowledgebaseCrawler as fail\r\n'
'- KnowledgebaseCrawlerImport fails getting deleted '
'after'},
'score': 0.0,
'values': []}, {'id': 'cd4abe04-426d-4568-9f45-3de5180de5a5',
'metadata': {'chunk': 0.0,
'file_name': 'crawler-issues-19MAR2025%20-%20Copy%20%2824%29.txt',
'is_dict': 'no',
'text': '- if CrawlerJob fails statues will never update, import '
'status wont update\r\n'
'(add failed() method -> create CrawlerProcess with '
'failed status, record last process time??)\r\n'
'- if CrawlerProcessJob fails before recording last '
'process time '
'("Cache::put($processCrawler->lastCrawlerProcessTimeCacheKey(), '
'now());") the status will never upate\r\n'
'- importing failed Crawler pages still marked '
'success\r\n'
'- if CrawlerFilesJob fails CrawlerProcess status wont '
'update\r\n'
'- if CrawlerPrepareKnowledgebaseTrainingJob fails '
'import status wont update\r\n'
'- CrawlerFilesProcessTrainingJob@handleProcessingError '
'-- failed items are marked as processed/success.\r\n'
'should be markItemAsFailed() same as in '
'CrawlerPageProcessTrainingJob?\r\n'
'\r\n'
'- Finalizing Logic Duplication\r\n'
'The completion checking and finalization logic is '
'duplicated across multiple jobs:\r\n'
'\r\n'
'CrawlerPageProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CrawlerFilesProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CheckKnowledgebaseCrawlerImportCompletion::handle\r\n'
'\r\n'
'Each has subtle differences, creating opportunities for '
'inconsistent behavior.\r\n'
'\r\n'
'- Unreliable S3 File Operations\r\n'
'File operations on S3 have minimal error handling:\r\n'
'\r\n'
'$this->filesystem->put($s3Path, $newContent);\r\n'
'return $this->filesystem->url($s3Path);\r\n'
'\r\n'
'If the S3 put operation fails silently, subsequent code '
'would continue with a URL to a non-existent file.\r\n'
'\r\n'
'- try using knowledgebase_crawler_imports table instead '
"of cache for counting since it's already "
'implemented?\r\n'
'update counts every x seconds instead of realtime '
'updates?\r\n'
'\r\n'
'- CrawlerFileProcessTrainingJob and/or '
'CrawlerPageProcessTrainingJob failure not marking '
'KnowledgebaseCrawler as fail\r\n'
'- KnowledgebaseCrawlerImport fails getting deleted '
'after'},
'score': 0.0,
'values': []}, {'id': '7a1c397d-0a2d-4689-9560-2baefa40d5a2',
'metadata': {'chunk': 0.0,
'file_name': 'crawler-issues-19MAR2025%20-%20Copy%20%2818%29.txt',
'is_dict': 'no',
'text': '- if CrawlerJob fails statues will never update, import '
'status wont update\r\n'
'(add failed() method -> create CrawlerProcess with '
'failed status, record last process time??)\r\n'
'- if CrawlerProcessJob fails before recording last '
'process time '
'("Cache::put($processCrawler->lastCrawlerProcessTimeCacheKey(), '
'now());") the status will never upate\r\n'
'- importing failed Crawler pages still marked '
'success\r\n'
'- if CrawlerFilesJob fails CrawlerProcess status wont '
'update\r\n'
'- if CrawlerPrepareKnowledgebaseTrainingJob fails '
'import status wont update\r\n'
'- CrawlerFilesProcessTrainingJob@handleProcessingError '
'-- failed items are marked as processed/success.\r\n'
'should be markItemAsFailed() same as in '
'CrawlerPageProcessTrainingJob?\r\n'
'\r\n'
'- Finalizing Logic Duplication\r\n'
'The completion checking and finalization logic is '
'duplicated across multiple jobs:\r\n'
'\r\n'
'CrawlerPageProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CrawlerFilesProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CheckKnowledgebaseCrawlerImportCompletion::handle\r\n'
'\r\n'
'Each has subtle differences, creating opportunities for '
'inconsistent behavior.\r\n'
'\r\n'
'- Unreliable S3 File Operations\r\n'
'File operations on S3 have minimal error handling:\r\n'
'\r\n'
'$this->filesystem->put($s3Path, $newContent);\r\n'
'return $this->filesystem->url($s3Path);\r\n'
'\r\n'
'If the S3 put operation fails silently, subsequent code '
'would continue with a URL to a non-existent file.\r\n'
'\r\n'
'- try using knowledgebase_crawler_imports table instead '
"of cache for counting since it's already "
'implemented?\r\n'
'update counts every x seconds instead of realtime '
'updates?\r\n'
'\r\n'
'- CrawlerFileProcessTrainingJob and/or '
'CrawlerPageProcessTrainingJob failure not marking '
'KnowledgebaseCrawler as fail\r\n'
'- KnowledgebaseCrawlerImport fails getting deleted '
'after'},
'score': 0.0,
'values': []}, {'id': '26cc55bd-a759-4b92-8636-aff67cf84a83',
'metadata': {'chunk': 0.0,
'file_name': 'crawler-issues-19MAR2025%20-%20Copy%282%29.txt',
'is_dict': 'no',
'text': '- if CrawlerJob fails statues will never update, import '
'status wont update\r\n'
'(add failed() method -> create CrawlerProcess with '
'failed status, record last process time??)\r\n'
'- if CrawlerProcessJob fails before recording last '
'process time '
'("Cache::put($processCrawler->lastCrawlerProcessTimeCacheKey(), '
'now());") the status will never upate\r\n'
'- importing failed Crawler pages still marked '
'success\r\n'
'- if CrawlerFilesJob fails CrawlerProcess status wont '
'update\r\n'
'- if CrawlerPrepareKnowledgebaseTrainingJob fails '
'import status wont update\r\n'
'- CrawlerFilesProcessTrainingJob@handleProcessingError '
'-- failed items are marked as processed/success.\r\n'
'should be markItemAsFailed() same as in '
'CrawlerPageProcessTrainingJob?\r\n'
'\r\n'
'- Finalizing Logic Duplication\r\n'
'The completion checking and finalization logic is '
'duplicated across multiple jobs:\r\n'
'\r\n'
'CrawlerPageProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CrawlerFilesProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CheckKnowledgebaseCrawlerImportCompletion::handle\r\n'
'\r\n'
'Each has subtle differences, creating opportunities for '
'inconsistent behavior.\r\n'
'\r\n'
'- Unreliable S3 File Operations\r\n'
'File operations on S3 have minimal error handling:\r\n'
'\r\n'
'$this->filesystem->put($s3Path, $newContent);\r\n'
'return $this->filesystem->url($s3Path);\r\n'
'\r\n'
'If the S3 put operation fails silently, subsequent code '
'would continue with a URL to a non-existent file.\r\n'
'\r\n'
'- try using knowledgebase_crawler_imports table instead '
"of cache for counting since it's already "
'implemented?\r\n'
'update counts every x seconds instead of realtime '
'updates?\r\n'
'\r\n'
'- CrawlerFileProcessTrainingJob and/or '
'CrawlerPageProcessTrainingJob failure not marking '
'KnowledgebaseCrawler as fail\r\n'
'- KnowledgebaseCrawlerImport fails getting deleted '
'after'},
'score': 0.0,
'values': []}, {'id': '671736c2-3ad9-46de-9ef1-9755b80e26b7',
'metadata': {'chunk': 0.0,
'file_name': 'crawler-issues-19MAR2025%20-%20Copy%20%2815%29.txt',
'is_dict': 'no',
'text': '- if CrawlerJob fails statues will never update, import '
'status wont update\r\n'
'(add failed() method -> create CrawlerProcess with '
'failed status, record last process time??)\r\n'
'- if CrawlerProcessJob fails before recording last '
'process time '
'("Cache::put($processCrawler->lastCrawlerProcessTimeCacheKey(), '
'now());") the status will never upate\r\n'
'- importing failed Crawler pages still marked '
'success\r\n'
'- if CrawlerFilesJob fails CrawlerProcess status wont '
'update\r\n'
'- if CrawlerPrepareKnowledgebaseTrainingJob fails '
'import status wont update\r\n'
'- CrawlerFilesProcessTrainingJob@handleProcessingError '
'-- failed items are marked as processed/success.\r\n'
'should be markItemAsFailed() same as in '
'CrawlerPageProcessTrainingJob?\r\n'
'\r\n'
'- Finalizing Logic Duplication\r\n'
'The completion checking and finalization logic is '
'duplicated across multiple jobs:\r\n'
'\r\n'
'CrawlerPageProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CrawlerFilesProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CheckKnowledgebaseCrawlerImportCompletion::handle\r\n'
'\r\n'
'Each has subtle differences, creating opportunities for '
'inconsistent behavior.\r\n'
'\r\n'
'- Unreliable S3 File Operations\r\n'
'File operations on S3 have minimal error handling:\r\n'
'\r\n'
'$this->filesystem->put($s3Path, $newContent);\r\n'
'return $this->filesystem->url($s3Path);\r\n'
'\r\n'
'If the S3 put operation fails silently, subsequent code '
'would continue with a URL to a non-existent file.\r\n'
'\r\n'
'- try using knowledgebase_crawler_imports table instead '
"of cache for counting since it's already "
'implemented?\r\n'
'update counts every x seconds instead of realtime '
'updates?\r\n'
'\r\n'
'- CrawlerFileProcessTrainingJob and/or '
'CrawlerPageProcessTrainingJob failure not marking '
'KnowledgebaseCrawler as fail\r\n'
'- KnowledgebaseCrawlerImport fails getting deleted '
'after'},
'score': 0.0,
'values': []}, {'id': '21680b9e-654f-4981-b5de-2e9e2dbc7c3d',
'metadata': {'chunk': 0.0,
'file_name': 'crawler-issues-19MAR2025%20-%20Copy%281%29.txt',
'is_dict': 'no',
'text': '- if CrawlerJob fails statues will never update, import '
'status wont update\r\n'
'(add failed() method -> create CrawlerProcess with '
'failed status, record last process time??)\r\n'
'- if CrawlerProcessJob fails before recording last '
'process time '
'("Cache::put($processCrawler->lastCrawlerProcessTimeCacheKey(), '
'now());") the status will never upate\r\n'
'- importing failed Crawler pages still marked '
'success\r\n'
'- if CrawlerFilesJob fails CrawlerProcess status wont '
'update\r\n'
'- if CrawlerPrepareKnowledgebaseTrainingJob fails '
'import status wont update\r\n'
'- CrawlerFilesProcessTrainingJob@handleProcessingError '
'-- failed items are marked as processed/success.\r\n'
'should be markItemAsFailed() same as in '
'CrawlerPageProcessTrainingJob?\r\n'
'\r\n'
'- Finalizing Logic Duplication\r\n'
'The completion checking and finalization logic is '
'duplicated across multiple jobs:\r\n'
'\r\n'
'CrawlerPageProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CrawlerFilesProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CheckKnowledgebaseCrawlerImportCompletion::handle\r\n'
'\r\n'
'Each has subtle differences, creating opportunities for '
'inconsistent behavior.\r\n'
'\r\n'
'- Unreliable S3 File Operations\r\n'
'File operations on S3 have minimal error handling:\r\n'
'\r\n'
'$this->filesystem->put($s3Path, $newContent);\r\n'
'return $this->filesystem->url($s3Path);\r\n'
'\r\n'
'If the S3 put operation fails silently, subsequent code '
'would continue with a URL to a non-existent file.\r\n'
'\r\n'
'- try using knowledgebase_crawler_imports table instead '
"of cache for counting since it's already "
'implemented?\r\n'
'update counts every x seconds instead of realtime '
'updates?\r\n'
'\r\n'
'- CrawlerFileProcessTrainingJob and/or '
'CrawlerPageProcessTrainingJob failure not marking '
'KnowledgebaseCrawler as fail\r\n'
'- KnowledgebaseCrawlerImport fails getting deleted '
'after'},
'score': 0.0,
'values': []}, {'id': 'd1c5ef31-a72c-4b52-887d-d74c940e6448',
'metadata': {'chunk': 0.0,
'file_name': 'crawler-issues-19MAR2025%20-%20Copy%20%2819%29.txt',
'is_dict': 'no',
'text': '- if CrawlerJob fails statues will never update, import '
'status wont update\r\n'
'(add failed() method -> create CrawlerProcess with '
'failed status, record last process time??)\r\n'
'- if CrawlerProcessJob fails before recording last '
'process time '
'("Cache::put($processCrawler->lastCrawlerProcessTimeCacheKey(), '
'now());") the status will never upate\r\n'
'- importing failed Crawler pages still marked '
'success\r\n'
'- if CrawlerFilesJob fails CrawlerProcess status wont '
'update\r\n'
'- if CrawlerPrepareKnowledgebaseTrainingJob fails '
'import status wont update\r\n'
'- CrawlerFilesProcessTrainingJob@handleProcessingError '
'-- failed items are marked as processed/success.\r\n'
'should be markItemAsFailed() same as in '
'CrawlerPageProcessTrainingJob?\r\n'
'\r\n'
'- Finalizing Logic Duplication\r\n'
'The completion checking and finalization logic is '
'duplicated across multiple jobs:\r\n'
'\r\n'
'CrawlerPageProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CrawlerFilesProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CheckKnowledgebaseCrawlerImportCompletion::handle\r\n'
'\r\n'
'Each has subtle differences, creating opportunities for '
'inconsistent behavior.\r\n'
'\r\n'
'- Unreliable S3 File Operations\r\n'
'File operations on S3 have minimal error handling:\r\n'
'\r\n'
'$this->filesystem->put($s3Path, $newContent);\r\n'
'return $this->filesystem->url($s3Path);\r\n'
'\r\n'
'If the S3 put operation fails silently, subsequent code '
'would continue with a URL to a non-existent file.\r\n'
'\r\n'
'- try using knowledgebase_crawler_imports table instead '
"of cache for counting since it's already "
'implemented?\r\n'
'update counts every x seconds instead of realtime '
'updates?\r\n'
'\r\n'
'- CrawlerFileProcessTrainingJob and/or '
'CrawlerPageProcessTrainingJob failure not marking '
'KnowledgebaseCrawler as fail\r\n'
'- KnowledgebaseCrawlerImport fails getting deleted '
'after'},
'score': 0.0,
'values': []}, {'id': 'c0d83539-fa99-4975-b7c1-f4fd1f677b28',
'metadata': {'chunk': 0.0,
'file_name': 'crawler-issues-19MAR2025%20-%20Copy%20%2817%29.txt',
'is_dict': 'no',
'text': '- if CrawlerJob fails statues will never update, import '
'status wont update\r\n'
'(add failed() method -> create CrawlerProcess with '
'failed status, record last process time??)\r\n'
'- if CrawlerProcessJob fails before recording last '
'process time '
'("Cache::put($processCrawler->lastCrawlerProcessTimeCacheKey(), '
'now());") the status will never upate\r\n'
'- importing failed Crawler pages still marked '
'success\r\n'
'- if CrawlerFilesJob fails CrawlerProcess status wont '
'update\r\n'
'- if CrawlerPrepareKnowledgebaseTrainingJob fails '
'import status wont update\r\n'
'- CrawlerFilesProcessTrainingJob@handleProcessingError '
'-- failed items are marked as processed/success.\r\n'
'should be markItemAsFailed() same as in '
'CrawlerPageProcessTrainingJob?\r\n'
'\r\n'
'- Finalizing Logic Duplication\r\n'
'The completion checking and finalization logic is '
'duplicated across multiple jobs:\r\n'
'\r\n'
'CrawlerPageProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CrawlerFilesProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CheckKnowledgebaseCrawlerImportCompletion::handle\r\n'
'\r\n'
'Each has subtle differences, creating opportunities for '
'inconsistent behavior.\r\n'
'\r\n'
'- Unreliable S3 File Operations\r\n'
'File operations on S3 have minimal error handling:\r\n'
'\r\n'
'$this->filesystem->put($s3Path, $newContent);\r\n'
'return $this->filesystem->url($s3Path);\r\n'
'\r\n'
'If the S3 put operation fails silently, subsequent code '
'would continue with a URL to a non-existent file.\r\n'
'\r\n'
'- try using knowledgebase_crawler_imports table instead '
"of cache for counting since it's already "
'implemented?\r\n'
'update counts every x seconds instead of realtime '
'updates?\r\n'
'\r\n'
'- CrawlerFileProcessTrainingJob and/or '
'CrawlerPageProcessTrainingJob failure not marking '
'KnowledgebaseCrawler as fail\r\n'
'- KnowledgebaseCrawlerImport fails getting deleted '
'after'},
'score': 0.0,
'values': []}, {'id': '88c96501-c678-4566-9dd2-131b30d0948b',
'metadata': {'chunk': 0.0,
'file_name': 'crawler-issues-19MAR2025%20-%20Copy%20%2816%29.txt',
'is_dict': 'no',
'text': '- if CrawlerJob fails statues will never update, import '
'status wont update\r\n'
'(add failed() method -> create CrawlerProcess with '
'failed status, record last process time??)\r\n'
'- if CrawlerProcessJob fails before recording last '
'process time '
'("Cache::put($processCrawler->lastCrawlerProcessTimeCacheKey(), '
'now());") the status will never upate\r\n'
'- importing failed Crawler pages still marked '
'success\r\n'
'- if CrawlerFilesJob fails CrawlerProcess status wont '
'update\r\n'
'- if CrawlerPrepareKnowledgebaseTrainingJob fails '
'import status wont update\r\n'
'- CrawlerFilesProcessTrainingJob@handleProcessingError '
'-- failed items are marked as processed/success.\r\n'
'should be markItemAsFailed() same as in '
'CrawlerPageProcessTrainingJob?\r\n'
'\r\n'
'- Finalizing Logic Duplication\r\n'
'The completion checking and finalization logic is '
'duplicated across multiple jobs:\r\n'
'\r\n'
'CrawlerPageProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CrawlerFilesProcessTrainingJob::checkCompletionAndFinalize\r\n'
'CheckKnowledgebaseCrawlerImportCompletion::handle\r\n'
'\r\n'
'Each has subtle differences, creating opportunities for '
'inconsistent behavior.\r\n'
'\r\n'
'- Unreliable S3 File Operations\r\n'
'File operations on S3 have minimal error handling:\r\n'
'\r\n'
'$this->filesystem->put($s3Path, $newContent);\r\n'
'return $this->filesystem->url($s3Path);\r\n'
'\r\n'
'If the S3 put operation fails silently, subsequent code '
'would continue with a URL to a non-existent file.\r\n'
'\r\n'
'- try using knowledgebase_crawler_imports table instead '
"of cache for counting since it's already "
'implemented?\r\n'
'update counts every x seconds instead of realtime '
'updates?\r\n'
'\r\n'
'- CrawlerFileProcessTrainingJob and/or '
'CrawlerPageProcessTrainingJob failure not marking '
'KnowledgebaseCrawler as fail\r\n'
'- KnowledgebaseCrawlerImport fails getting deleted '
'after'},
'score': 0.0,
'values': []}, {'id': 'd39117c1-58d3-439c-aa3e-424b4b01a2d6',
'metadata': {'chunk': 0.0,
'file_name': 'apacare-primer%281%29.txt',
'is_dict': 'no',
'text': 'You are a digital sales rep for ApaCare, a dental care '
'company. Please assist clients with their '
'dental-related questions.\r\n'
'Use German in your responses.\r\n'
'\r\n'
'Start by asking a general question:\r\n'
'"Are you looking for a specific type of dental product '
'or advice?"\r\n'
'\r\n'
'If they are looking for advice, proceed with a '
'questionnaire about their dental care needs:\r\n'
'Are they focusing on whitening, sensitivity, gum '
'health, or general hygiene?\r\n'
'Try to ask a questionnaire to have clients describe '
'their problems.\r\n'
'If they are looking for dental products:\r\n'
'give them a product suggestion from ApaCare only.\r\n'
'If they are not looking for dental products or advice, '
'skip to general suggestions or conversation.\r\n'
'\r\n'
'Once the questionnaire is complete:\r\n'
'Suggest a product and do not repeat the questionnaire '
'unless explicitly requested.\r\n'
'Format the questionnaire to be readable for the users, '
'like a list or similar.\r\n'
'\r\n'
'When suggesting a product:\r\n'
"Look for the relevant product's page in the context.\r\n"
'Provide a detailed suggestion with an anchor tag link. '
'Ensure the target attribute is set to "__blank" and use '
'this format:\r\n'
'\r\n'
'[replace this with the product name]\r\n'
'\r\n'
'\r\n'
'All links should have "__blank" target attribute.\r\n'
"Don't translate links href to German.\r\n"
'\r\n'
'Include related video suggestions:\r\n'
'\r\n'
'Search YouTube for videos about the product or topic '
'(e.g., how to use an electric toothbrush, flossing '
'techniques).\r\n'
'Embed the video in an iframe using this format:\r\n'
'\r\n'
'\r\n'
'For Google Drive videos, append /preview to the link '
'and embed it:\r\n'
'\r\n'
'\r\n'
'For public URL video links, use the