Case-Insensitive Search using EF Core and PostgreSQL

PostgreSQL is a great database. It can do just about anything. But one area that has surprising gaps is case insensitive search. Surely, having the benefit of thirty years of database research, users can search for a customer without knowing the exact capitalization of their name? Turns out it’s not so easy..

Read on to find out how to implement this in EF Core.

Potential Approaches

PostgreSQL offers four main approaches, plus I’ll add one more related to how we map in EF Core. Others have blogged about this, so I’ll just summarize.

Approach	Pros	Cons	Assessment
Case-Insensitive Collation	Define once, applied automatically.	Pattern-matching operators are disallowed. No `LIKE` or `StartsWith()`.	Unusable
`citext` column type	Define once, applied automatically. Simplest to query & define indexes.	`citext` doesn’t accept a length specifier. Length limits can only be manually implemented eg. by check constraint.	Good
`ILIKE` operator		Application must code `ILIKE` and patterns. Index must be on `lower(column)` to support search, and assist prefix search only.	Awkward to use & index
Functional index on `lower(column)`		Need to define indexes manually. Every application query needs to express `ToLower()`. B-Tree index assist prefix search only.	Awkward to use & index
Derived property & physical column with normalized case	Fairly simple to query. Simple to define indexes. Portable across DBs.	Application queries should call `ToLower()` on the searched value. Slight increase in storage size.	Moderate

Note: In PostgreSQL, standard (B-Tree) indexes only benefit prefix queries such as StartsWith(); substring queries such as Contains() fall back to a table scan and will not benefit.

I’ll discuss the two better options below.

‘citext’ Column Type

citext is a case-insensitive data type available as an extension in PostgreSQL. While the extension has to be enabled before use, it is included in the standard Postgres distribution and available on platforms such as AWS RDS Postgres and Aurora.

This approach offers many benefits:

citext can simply be specified as the column type in a [Column] attribute.
Querying is completely transparent. (Note that it’s also always case-insensitive.)
Index declarations are transparent.
Npgsql automatically enables citext extension in your migration.
On the downside, check constraints must be explicitly coded if you wish to restrict column width; and citext is less portable to other databases.

Here’s how we can implement this using EF Core:

Specify citext as type in [Column] attributes.
If needed, define check constraints for column width in your DbContext.

public class Customer
{
    public int Id { get; set; }
    [Column(TypeName="citext")]
    public string Name { get; set; }
}

public class MyDbContext : DbContext
{
    // ...
    
    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Entity<Customer>()
            .HasCheckConstraint("CK_Customer_Name_Length", "length(Name) <= 64");
    }
}

Overall, the citext is most transparent for application querying. However it can be awkward where column widths need to be specified, needing SQL constraints to be manually coded.

Derived Property & Physical Column with Normalized Case

Rather than working in the database, this approach encapsulates case normalization within a derived property in the entity class. Queries & indexes can then be simply specified on the derived property.

This approach offers several benefits:

Normalization of the derived property is encapsulated in one property in the Entity class.
Indexes, column widths etc all definable as normal — EF Core usage is 100% vanilla.
Portable across databases.
On the downside, queries must convert search value to the same case (eg lowercase).

When considering this approach, we do have the question whether to normalize to lowercase (as commonly shown in PostgreSQL examples) or uppercase (as EF Core examples show).

In PostgreSQL, the lower() function is commonly used for case-insensitive comparisons. Lowercase conversion is preferred because it avoids most issues with inconsistencies in locale-specific uppercase conversions.
EF Core often shows ToUpper() since that aligns with historical SQL Server patterns.

Across computer science, normalization to lowercase seems most recommended for case-insensitive processing; for consistency & efficiency reasons. I have therefore tended to use lowercase.

Here’s how we can implement this using EF Core:

Define the derived property:
- Define a derived property.
- Implement a custom getter lowercasing the value.
- Implement a custom setter which is a no-op.
Use the derived property.
- When searching, ensure the value being searched for is also lowercase.

public class Customer
{
    public int Id { get; set; }
    public string Name { get; set; }

    // Name Lower;  derived property for searchability
    public string NameLower
    {
        get => Name.ToLowerInvariant();
        set {}  // no-op
    }
}

The derived property can be used naturally & easily. Indexes can be defined, and queries coded against it. Querying is simple; we just write a Where() condition on the derived property.

var customers = context.Customers.Where(cand => cand.NameLower.StartsWith(search.ToLower()));

Since the derived property is transparent to EF Core, index definitions are also simple:

[Index(nameof(NameLower))]
public class Customer

Overall, this pattern is concise, portable & moderately encapsulated. Queries are fairly simple, but should call ToLower() on the search value.

Conclusion

Long term, I would encourage PostgreSQL team to take a more pragmatic approach to case-insensitivity. All available options could be improved, and severe design limitations around case-insensitive collations should be reconsidered.

However PostgreSQL does offer sufficient features to build a good solution today. When choosing which approach for case-insensitivity in application use, I believe the following goals are important:

Simple to query & use from the application
Easy to declare indexes
Able to be indexed, to retain performance as datasets grow

For many requirements, the citext column type extension can be ideal; though custom check constraints may be required. Otherwise, the Derived Property approach offers a well-encapsulated portable solution to case-insensitivity. I believe these offer generally the two best approaches for case-insensitivity in EF Core applications using Postgres.

Thanks for reading. Let us know your thoughts and favourite approaches on case-insensitivity!

Literate Java

Case-Insensitive Search using EF Core and PostgreSQL

Potential Approaches

‘citext’ Column Type

Derived Property & Physical Column with Normalized Case

Conclusion

References

Leave a Reply Cancel reply

Insights into Java coding, OO design & architecture