VOC_Monitor/.sisyphus/notepads/crawl-fixes/learnings.md

1.9 KiB

2026-02-19 Initialization

  • Notepad initialized for crawling fixes orchestration.

2026-02-19 Scraper Service Fixes

Fix 1: Unconditional List Refresh (Lines 168-172)

Before: Conditional check skipped navigation if URL contained "act=searchList" and data existed After: Always navigate to refresh list via _navigate_to_list_via_physical_click() Reason: Cached pages cause new posts to be missed. Unconditional refresh ensures latest state. Code: Removed nested if checking "act=searchList" and _check_data_exists()

Fix 2: Preserve Non-Public Rows (Lines 231-236)

Before: Skip (continue) non-public rows entirely with if is_public == 0: continue After: Keep all rows in metadata with is_public=0 flag, log discovery Reason: Metadata completeness required; detail access already has timeout handling Code: Changed continue to debug log "비공개 게시글 수집: {voc_id} (상세 조회 스킵 예정)"

Fix 3: Detail Attempt Policy (Lines 256-269)

Before: Verbose comments, explicit "관심 대상인 경우 상세까지 긁어서 정확도 높임" After: Clear policy: "관심 게시글: 공개/비공개 구분 없이 상세 조회 시도" Reason: Support related 차량 cases; non-fatal errors in fetch_detail_content (timeout, permissions) Code: Updated comments to explain that fetch_detail_content handles errors for both public/private

Non-Functional Changes

  • All modifications are isolated to fetch_list_pages() method
  • No dependency additions
  • Failures remain non-fatal (try-except in fetch_detail_content catches all)
  • Architecture preserved (same control flow, same data structure)

Testing Notes

  • List refresh now runs on every fetch_list_pages() call
  • Metadata includes is_public=0 rows (complete state capture)
  • Detail attempts continue for is_target rows regardless of is_public value
  • Timeout/permission errors in fetch_detail_content are silently caught (returns None)