mirror of
https://github.com/whekin/household-bot.git
synced 2026-03-31 12:24:02 +00:00
2.0 KiB
2.0 KiB
HOUSEBOT-022: Hybrid Purchase Parser
Summary
Implement a rules-first purchase parser with optional LLM fallback for ambiguous Telegram purchase messages.
Goals
- Parse common RU/EN purchase text with deterministic regex rules first.
- Call LLM fallback only when rules cannot safely resolve a single amount.
- Persist raw + parsed fields + confidence + parser mode.
Non-goals
- Receipt OCR.
- Complex multi-item itemization.
Scope
- In: parser core logic, fallback interface, bot ingestion integration, DB fields for parser output.
- Out: settlement posting and command UIs.
Interfaces and Contracts
parsePurchaseMessage({ rawText }, { llmFallback? })- Parser result fields:
amountMinorcurrencyitemDescriptionconfidenceparserMode(rules|llm)needsReview
Domain Rules
- Rules parser attempts single-amount extraction first.
- Missing currency defaults to GEL and marks
needsReview=true. - Ambiguous text (multiple amounts) triggers LLM fallback if configured.
Data Model Changes
purchase_messagesstores parsed fields:parsed_amount_minorparsed_currencyparsed_item_descriptionparser_modeparser_confidenceneeds_reviewparser_error
Security and Privacy
- LLM fallback sends only minimal raw text needed for parsing.
- API key required for fallback path.
Observability
processing_statusandparser_errorcapture parse outcomes.
Edge Cases and Failure Modes
- Empty message text.
- Multiple numeric amounts.
- Invalid LLM output payload.
- Missing API key disables LLM fallback.
Test Plan
- Unit tests for rules parser and fallback behavior.
- Ingestion tests for topic filter remain valid.
Acceptance Criteria
- Rules parser handles common message patterns.
- LLM fallback is invoked only when rules are insufficient.
- Parsed result + confidence + parser mode persisted.
Rollout Plan
- Enable in dev group and monitor
needs_reviewrate before stricter auto-accept rules.