All checks were successful
Deploy to Production / deploy (push) Successful in 1m8s
Port make-quant-py's FnGuide scraping logic into galaxy-po's BaseCollector pattern. Collects annual and quarterly financial statements (revenue, net income, total assets, etc.) and maps Korean account names to English keys for FactorCalculator. Scheduled weekly on Monday 19:00 KST since data updates quarterly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2.6 KiB
2.6 KiB
Financial Statement Collector Design
Date: 2026-02-18
Problem
galaxy-po has a Financial model and FactorCalculator that depends on financial statement data (ROE, GPA, F-Score calculations), but no collector exists to actually populate the financials table.
make-quant-py already implements FnGuide scraping for financial statements in src/data/financial.py.
Solution
Implement FinancialCollector following the existing BaseCollector pattern, porting make-quant-py's FnGuide scraping logic to galaxy-po's architecture.
Data Source
FnGuide (https://comp.fnguide.com/SVO2/ASP/SVD_Finance.asp) provides:
- Annual and quarterly financial statements
- Income statement, balance sheet, cash flow statement
- Free, no API key required
- HTML table scraping via
pd.read_html()
Account Name Mapping
FnGuide returns Korean account names. Map to English keys expected by FactorCalculator:
| FnGuide (Korean) | Financial.account (English) |
|---|---|
| 매출액 | revenue |
| 매출총이익 | gross_profit |
| 영업이익 | operating_income |
| 당기순이익 | net_income |
| 자산총계 | total_assets |
| 부채총계 | total_liabilities |
| 자본총계 | total_equity |
| 유동자산 | current_assets |
| 유동부채 | current_liabilities |
| 영업활동으로인한현금흐름 | operating_cash_flow |
Architecture
FinancialCollector(BaseCollector)
├── collect() → iterate all tickers, call _fetch_financial_data for each
├── _fetch_financial_data(ticker) → scrape FnGuide, return list of record dicts
├── _clean_financial_data(df, ticker, report_type) → clean and normalize DataFrame
└── ACCOUNT_MAP (class constant) → Korean → English account mapping
Data Flow
- Get ticker list from
stockstable - For each ticker:
- Fetch FnGuide page via
pd.read_html(url, displayed_only=False) - Annual: concat data[0], data[2], data[4] (income, balance, cashflow)
- Quarterly: concat data[1], data[3], data[5]
- Parse fiscal year end month from page HTML
- Clean: remove NaN rows, deduplicate accounts, melt wide→long
- Map Korean account names to English
- Sleep 2 seconds between tickers (rate limiting)
- Fetch FnGuide page via
- Upsert all records to
financialstable (PostgreSQL ON CONFLICT)
Files to Change
- New:
backend/app/services/collectors/financial_collector.py - Modify:
backend/app/services/collectors/__init__.py(add export) - Modify:
backend/jobs/collection_job.py(add to daily collection)
Scheduler Integration
Add FinancialCollector to run_daily_collection(). Financial data updates quarterly, but upsert makes daily runs idempotent.