mirror of
https://github.com/whekin/household-bot.git
synced 2026-03-31 12:04:02 +00:00
Codex/whe 15 bootstrap workspace (#1)
* feat(WHE-15): bootstrap bun workspace with app and package scaffolds * chore(WHE-17): switch workspace typecheck to tsgo * chore(WHE-16): configure oxlint and oxfmt no-semicolon style * chore: address CodeRabbit review feedback * chore: apply coderabbit fixes and add review script * docs: add ADR decision metadata
This commit is contained in:
@@ -1,22 +1,27 @@
|
||||
# HOUSEBOT-003: Purchase Parser (Hybrid Rules + LLM Fallback)
|
||||
|
||||
## Summary
|
||||
|
||||
Parse free-form purchase messages (primarily Russian) from the Telegram topic `Общие покупки` into structured ledger entries.
|
||||
|
||||
## Goals
|
||||
|
||||
- High precision amount extraction with deterministic rules first.
|
||||
- Fallback to LLM for ambiguous or irregular message formats.
|
||||
- Persist raw input, parsed output, and confidence score.
|
||||
|
||||
## Non-goals
|
||||
|
||||
- Receipt image OCR.
|
||||
- Full conversational NLP.
|
||||
|
||||
## Scope
|
||||
|
||||
- In: parsing pipeline, confidence policy, parser contracts.
|
||||
- Out: bot listener wiring (separate ticket).
|
||||
|
||||
## Interfaces and Contracts
|
||||
|
||||
- `parsePurchase(input): ParsedPurchaseResult`
|
||||
- `ParsedPurchaseResult`:
|
||||
- `amountMinor`
|
||||
@@ -27,11 +32,13 @@ Parse free-form purchase messages (primarily Russian) from the Telegram topic `
|
||||
- `needsReview`
|
||||
|
||||
## Domain Rules
|
||||
|
||||
- GEL is default currency when omitted.
|
||||
- Confidence threshold determines auto-accept vs review flag.
|
||||
- Never mutate original message text.
|
||||
|
||||
## Data Model Changes
|
||||
|
||||
- `purchase_entries` fields:
|
||||
- `raw_text`
|
||||
- `parsed_amount_minor`
|
||||
@@ -42,21 +49,25 @@ Parse free-form purchase messages (primarily Russian) from the Telegram topic `
|
||||
- `needs_review`
|
||||
|
||||
## Security and Privacy
|
||||
|
||||
- Sanitize prompt inputs for LLM adapter.
|
||||
- Do not send unnecessary metadata to LLM provider.
|
||||
|
||||
## Observability
|
||||
|
||||
- Parser mode distribution metrics.
|
||||
- Confidence histogram.
|
||||
- Error log for parse failures.
|
||||
|
||||
## Edge Cases and Failure Modes
|
||||
|
||||
- Missing amount.
|
||||
- Multiple possible amounts in one message.
|
||||
- Non-GEL currencies mentioned.
|
||||
- Typos and slang variants.
|
||||
|
||||
## Test Plan
|
||||
|
||||
- Unit:
|
||||
- regex extraction fixtures in RU/EN mixed text
|
||||
- confidence scoring behavior
|
||||
@@ -65,10 +76,12 @@ Parse free-form purchase messages (primarily Russian) from the Telegram topic `
|
||||
- E2E: consumed in bot ingestion ticket.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] Rules parser handles common RU message patterns.
|
||||
- [ ] LLM fallback adapter invoked only when rules are insufficient.
|
||||
- [ ] Confidence and parser mode stored in result.
|
||||
- [ ] Tests include ambiguous message fixtures.
|
||||
|
||||
## Rollout Plan
|
||||
|
||||
- Start with conservative threshold and monitor review rate.
|
||||
|
||||
Reference in New Issue
Block a user