galaxis-po/docs/plans/2026-02-18-financial-statement-collector-design.md
zephyrdark 5422383fd8
All checks were successful
Deploy to Production / deploy (push) Successful in 1m8s
feat: add FinancialCollector for FnGuide financial statement scraping
Port make-quant-py's FnGuide scraping logic into galaxy-po's
BaseCollector pattern. Collects annual and quarterly financial
statements (revenue, net income, total assets, etc.) and maps
Korean account names to English keys for FactorCalculator.
Scheduled weekly on Monday 19:00 KST since data updates quarterly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 22:38:05 +09:00

2.6 KiB

Financial Statement Collector Design

Date: 2026-02-18

Problem

galaxy-po has a Financial model and FactorCalculator that depends on financial statement data (ROE, GPA, F-Score calculations), but no collector exists to actually populate the financials table.

make-quant-py already implements FnGuide scraping for financial statements in src/data/financial.py.

Solution

Implement FinancialCollector following the existing BaseCollector pattern, porting make-quant-py's FnGuide scraping logic to galaxy-po's architecture.

Data Source

FnGuide (https://comp.fnguide.com/SVO2/ASP/SVD_Finance.asp) provides:

  • Annual and quarterly financial statements
  • Income statement, balance sheet, cash flow statement
  • Free, no API key required
  • HTML table scraping via pd.read_html()

Account Name Mapping

FnGuide returns Korean account names. Map to English keys expected by FactorCalculator:

FnGuide (Korean) Financial.account (English)
매출액 revenue
매출총이익 gross_profit
영업이익 operating_income
당기순이익 net_income
자산총계 total_assets
부채총계 total_liabilities
자본총계 total_equity
유동자산 current_assets
유동부채 current_liabilities
영업활동으로인한현금흐름 operating_cash_flow

Architecture

FinancialCollector(BaseCollector)
├── collect() → iterate all tickers, call _fetch_financial_data for each
├── _fetch_financial_data(ticker) → scrape FnGuide, return list of record dicts
├── _clean_financial_data(df, ticker, report_type) → clean and normalize DataFrame
└── ACCOUNT_MAP (class constant) → Korean → English account mapping

Data Flow

  1. Get ticker list from stocks table
  2. For each ticker:
    • Fetch FnGuide page via pd.read_html(url, displayed_only=False)
    • Annual: concat data[0], data[2], data[4] (income, balance, cashflow)
    • Quarterly: concat data[1], data[3], data[5]
    • Parse fiscal year end month from page HTML
    • Clean: remove NaN rows, deduplicate accounts, melt wide→long
    • Map Korean account names to English
    • Sleep 2 seconds between tickers (rate limiting)
  3. Upsert all records to financials table (PostgreSQL ON CONFLICT)

Files to Change

  • New: backend/app/services/collectors/financial_collector.py
  • Modify: backend/app/services/collectors/__init__.py (add export)
  • Modify: backend/jobs/collection_job.py (add to daily collection)

Scheduler Integration

Add FinancialCollector to run_daily_collection(). Financial data updates quarterly, but upsert makes daily runs idempotent.