Back to Notes
·5 min read

When the Frontend Sends a Query as a String

#FastAPI#Python#QueryParsing#InternalTools
When the Frontend Sends a Query as a String

This post describes a real project I worked on. Names, specific systems, and identifying details have been changed. The technical problems and approaches are real, but code snippets are simplified illustrations not production code.


The Request That Kept Coming Back

Every week, the same pattern:

"Can you pull records where region is North and signal is below -85?"

"I need everything where contractor is ABC OR contractor is XYZ, but NOT status INACTIVE."

"Records where name contains 'Tower' AND region is one of North, South, East."

I was becoming a human SQL interface for 50+ engineers. Each request took 10-15 minutes. Five requests a day. Over an hour daily, just writing SELECT statements.

I needed to build something that let them query data themselves. Without learning SQL.


The Architecture Decision

First question: how does the frontend tell the backend what to filter?

Option A: Structured JSON Clean for machines, but representing (A AND B) OR (C AND D) in nested JSON gets ugly. And when engineers looked at the JSON preview, their eyes glazed over.

Option B: Human-readable string region = 'North' AND signal < -85

Engineers understood Option B instantly. They think in conditions, not data structures.

The trade-off: now I had to build a parser.


The Parsing Challenge

The frontend sends: region = 'North' AND (signal < -85 OR capacity > 90)

I needed to turn this into a safe database query not by string concatenation (SQL injection nightmare), but by actually understanding the structure.

Classic compiler problem: tokenize → parse → generate.


Step 1: Tokenizing (The Pattern Ordering Trap)

Breaking the string into tokens seemed straightforward. Until I hit this:

Input: "status NOT IN (A, B)"

My first tokenizer saw: status, NOT, IN, (A, B)

But I needed: status, NOT IN, (A, B)

The fix: order patterns from longest to shortest. Check for NOT IN before checking for NOT and IN separately.

# Order matters! Longer patterns first.
PATTERNS = [
    (r'\bIS\s+NOT\s+NULL\b', 'IS_NOT_NULL'),
    (r'\bIS\s+NULL\b', 'IS_NULL'),
    (r'\bNOT\s+IN\b', 'NOT_IN'),      # Before NOT and IN
    (r'\bSTARTS_WITH\b', 'STARTS_WITH'),
    (r'\bENDS_WITH\b', 'ENDS_WITH'),
    (r'\bCONTAINS\b', 'CONTAINS'),
    (r'\bAND\b', 'AND'),
    (r'\bNOT\b', 'NOT'),              # After NOT IN
    (r'\bOR\b', 'OR'),
    (r'\bIN\b', 'IN'),                # After NOT IN
    # ... operators, values, etc.
]

Same issue with IS NOT NULL vs IS NULL vs IS. Pattern ordering is everything.


Step 2: Parsing (The Precedence Problem)

Tokens aren't enough. A AND B OR C is ambiguous.

Is it (A AND B) OR C or A AND (B OR C)?

I needed operator precedence. AND binds tighter than OR (like multiplication before addition). Parentheses override everything.

The trick: recursive descent with precedence levels.

def parse_expression(self):
    return self.parse_or()  # Lowest precedence
 
def parse_or(self):
    left = self.parse_and()  # AND binds tighter
    while self.current_token() == 'OR':
        self.consume('OR')
        right = self.parse_and()
        left = BinaryOp(left, 'OR', right)
    return left
 
def parse_and(self):
    left = self.parse_unary()  # NOT binds tighter still
    while self.current_token() == 'AND':
        self.consume('AND')
        right = self.parse_unary()
        left = BinaryOp(left, 'AND', right)
    return left

Each function calls the next-higher precedence level. The structure falls out naturally.

Result: an Abstract Syntax Tree.

Input: "region = 'North' AND (signal < -85 OR capacity > 90)"
 
Tree:
        AND
       /   \
  region='North'    OR
                   /  \
           signal<-85  capacity>90

Once I had the tree, generating SQL was just walking it.


Step 3: Safe SQL Generation (The Parameterization)

Never concatenate user input into SQL. Ever.

def generate_condition(self, node):
    field = node.field.lower()
 
    # Whitelist check
    if field not in ALLOWED_FIELDS:
        raise ValueError(f"Unknown field: {node.field}")
 
    # Add parameter, return placeholder
    self.params.append(node.value)
    placeholder = f"${len(self.params)}"
 
    return f"{field} {node.operator} {placeholder}"

The output: "region = $1 AND (signal < $2 OR capacity > $3)" plus ['North', -85, 90]

Database driver handles escaping. SQL injection impossible.


The Clever Bit: Fuzzy Field Matching

Here's where I almost made a mistake.

First instinct: reject unknown fields with an error. But when someone types regoin instead of region, a generic "unknown field" error is frustrating.

The fix: fuzzy matching with suggestions.

from difflib import get_close_matches
 
def validate_field(self, field):
    if field.lower() in ALLOWED_FIELDS:
        return field.lower()
 
    # Find close matches
    suggestions = get_close_matches(
        field.lower(),
        ALLOWED_FIELDS,
        n=1,
        cutoff=0.6
    )
 
    if suggestions:
        raise ValueError(f"Unknown field: '{field}'. Did you mean: {suggestions[0]}?")
    else:
        raise ValueError(f"Unknown field: '{field}'")

Now regoin = 'North' returns: "Unknown field: 'regoin'. Did you mean: region?"

This one feature eliminated 80% of support requests overnight. Users fixed their own typos instead of asking me.


What Surprised Me

They wanted to save queries. I built "run and done." They wanted to save complex queries and rerun them weekly. Added that in week 2.

They shared queries. One engineer built a useful filter, told teammates. Added shared queries in week 3.

They never read documentation. Wrote a 3-page guide. Nobody opened it. Added tooltips and inline hints instead. Usage jumped immediately.


The Results

MetricBeforeAfter
Data requests to me5/day0-1/day
Engineer wait time30 min - 2 hoursInstant
My time on data pulls1+ hour/day~10 min/week

The real win: engineers started finding insights I never would have looked for. They knew the domain. Now they had the tool.


Lessons Learned

The string format was worth the complexity. JSON would've been easier to parse but harder for users to understand. UX won over engineering convenience.

Never trust input, but be helpful about it. Don't just reject bad queries suggest fixes. The difference between "error" and "did you mean...?" is the difference between frustration and success.

Pattern ordering is everything in tokenizers. Longer patterns first. I learned this the hard way with NOT IN vs NOT.


Aamir Shahzad

Aamir Shahzad

Author

Software Engineer with 7+ years of experience building scalable data systems. Specializing in Django, Python, and applied AI.