mirror of https://github.com/whekin/household-bot.git synced 2026-03-31 19:24:03 +00:00

Files

Stas 4a26ac81d6 Codex/whe 15 bootstrap workspace (#1 )

* feat(WHE-15): bootstrap bun workspace with app and package scaffolds

* chore(WHE-17): switch workspace typecheck to tsgo

* chore(WHE-16): configure oxlint and oxfmt no-semicolon style

* chore: address CodeRabbit review feedback

* chore: apply coderabbit fixes and add review script

* docs: add ADR decision metadata

2026-03-05 00:56:24 +03:00

2.0 KiB

Raw Blame History

HOUSEBOT-003: Purchase Parser (Hybrid Rules + LLM Fallback)

Summary

Parse free-form purchase messages (primarily Russian) from the Telegram topic Общие покупки into structured ledger entries.

Goals

High precision amount extraction with deterministic rules first.
Fallback to LLM for ambiguous or irregular message formats.
Persist raw input, parsed output, and confidence score.

Non-goals

Receipt image OCR.
Full conversational NLP.

Scope

In: parsing pipeline, confidence policy, parser contracts.
Out: bot listener wiring (separate ticket).

Interfaces and Contracts

parsePurchase(input): ParsedPurchaseResult
ParsedPurchaseResult:
- amountMinor
- currency
- itemDescription
- confidence
- parserMode (rules | llm)
- needsReview

Domain Rules

GEL is default currency when omitted.
Confidence threshold determines auto-accept vs review flag.
Never mutate original message text.

Data Model Changes

purchase_entries fields:
- raw_text
- parsed_amount_minor
- currency
- item_description
- confidence
- parser_mode
- needs_review

Security and Privacy

Sanitize prompt inputs for LLM adapter.
Do not send unnecessary metadata to LLM provider.

Observability

Parser mode distribution metrics.
Confidence histogram.
Error log for parse failures.

Edge Cases and Failure Modes

Missing amount.
Multiple possible amounts in one message.
Non-GEL currencies mentioned.
Typos and slang variants.

Test Plan

Unit:
- regex extraction fixtures in RU/EN mixed text
- confidence scoring behavior
Integration:
- LLM fallback contract with mocked provider
E2E: consumed in bot ingestion ticket.

Acceptance Criteria

Rules parser handles common RU message patterns.
LLM fallback adapter invoked only when rules are insufficient.
Confidence and parser mode stored in result.
Tests include ambiguous message fixtures.

Rollout Plan

Start with conservative threshold and monitor review rate.

2.0 KiB Raw Blame History