1.9 KiB
2026-02-19 Initialization
- Notepad initialized for crawling fixes orchestration.
2026-02-19 Scraper Service Fixes
Fix 1: Unconditional List Refresh (Lines 168-172)
Before: Conditional check skipped navigation if URL contained "act=searchList" and data existed After: Always navigate to refresh list via _navigate_to_list_via_physical_click() Reason: Cached pages cause new posts to be missed. Unconditional refresh ensures latest state. Code: Removed nested if checking "act=searchList" and _check_data_exists()
Fix 2: Preserve Non-Public Rows (Lines 231-236)
Before: Skip (continue) non-public rows entirely with if is_public == 0: continue
After: Keep all rows in metadata with is_public=0 flag, log discovery
Reason: Metadata completeness required; detail access already has timeout handling
Code: Changed continue to debug log "비공개 게시글 수집: {voc_id} (상세 조회 스킵 예정)"
Fix 3: Detail Attempt Policy (Lines 256-269)
Before: Verbose comments, explicit "관심 대상인 경우 상세까지 긁어서 정확도 높임" After: Clear policy: "관심 게시글: 공개/비공개 구분 없이 상세 조회 시도" Reason: Support related 차량 cases; non-fatal errors in fetch_detail_content (timeout, permissions) Code: Updated comments to explain that fetch_detail_content handles errors for both public/private
Non-Functional Changes
- All modifications are isolated to fetch_list_pages() method
- No dependency additions
- Failures remain non-fatal (try-except in fetch_detail_content catches all)
- Architecture preserved (same control flow, same data structure)
Testing Notes
- List refresh now runs on every fetch_list_pages() call
- Metadata includes is_public=0 rows (complete state capture)
- Detail attempts continue for is_target rows regardless of is_public value
- Timeout/permission errors in fetch_detail_content are silently caught (returns None)