docs: add roadmap, ADRs, and initial HOUSEBOT specs

2026-03-31 12:04:02 +00:00 · 2026-03-05 00:47:18 +04:00
commit 768400214e
10 changed files with 653 additions and 0 deletions
--- a/docs/specs/HOUSEBOT-003-purchase-parser.md
+++ b/docs/specs/HOUSEBOT-003-purchase-parser.md
@@ -0,0 +1,74 @@
+# HOUSEBOT-003: Purchase Parser (Hybrid Rules + LLM Fallback)
+
+## Summary
+Parse free-form purchase messages (primarily Russian) from the Telegram topic `Общие покупки` into structured ledger entries.
+
+## Goals
+- High precision amount extraction with deterministic rules first.
+- Fallback to LLM for ambiguous or irregular message formats.
+- Persist raw input, parsed output, and confidence score.
+
+## Non-goals
+- Receipt image OCR.
+- Full conversational NLP.
+
+## Scope
+- In: parsing pipeline, confidence policy, parser contracts.
+- Out: bot listener wiring (separate ticket).
+
+## Interfaces and Contracts
+- `parsePurchase(input): ParsedPurchaseResult`
+- `ParsedPurchaseResult`:
+  - `amountMinor`
+  - `currency`
+  - `itemDescription`
+  - `confidence`
+  - `parserMode` (`rules` | `llm`)
+  - `needsReview`
+
+## Domain Rules
+- GEL is default currency when omitted.
+- Confidence threshold determines auto-accept vs review flag.
+- Never mutate original message text.
+
+## Data Model Changes
+- `purchase_entries` fields:
+  - `raw_text`
+  - `parsed_amount_minor`
+  - `currency`
+  - `item_description`
+  - `confidence`
+  - `parser_mode`
+  - `needs_review`
+
+## Security and Privacy
+- Sanitize prompt inputs for LLM adapter.
+- Do not send unnecessary metadata to LLM provider.
+
+## Observability
+- Parser mode distribution metrics.
+- Confidence histogram.
+- Error log for parse failures.
+
+## Edge Cases and Failure Modes
+- Missing amount.
+- Multiple possible amounts in one message.
+- Non-GEL currencies mentioned.
+- Typos and slang variants.
+
+## Test Plan
+- Unit:
+  - regex extraction fixtures in RU/EN mixed text
+  - confidence scoring behavior
+- Integration:
+  - LLM fallback contract with mocked provider
+- E2E: consumed in bot ingestion ticket.
+
+## Acceptance Criteria
+- [ ] Rules parser handles common RU message patterns.
+- [ ] LLM fallback adapter invoked only when rules are insufficient.
+- [ ] Confidence and parser mode stored in result.
+- [ ] Tests include ambiguous message fixtures.
+
+## Rollout Plan
+- Start with conservative threshold and monitor review rate.