Designing a Data Access Layer — Part 2

Greg Pabian
The Startup
Published in
4 min readJan 26, 2021

--

In the first part of this article series, I pondered on the way product requirements, database schemas and technology stacks influence the shape of data access layers. Today, I am going to discuss how their APIs could be designed, gathering all the necessary requirements first. The first step is to find out the consumers of the aforementioned APIs.

Consumers

Commonly, a DAL is going to be used in two places within a codebase: the source code and tests. This dichotomy of usage might require accounting for different database engines in each case. If they are indeed disparate, the engine in the test environment will likely be less sophisticated than in the production environment. The underlying data types might behave in a way that clashes with initial expectations, some transaction types and isolation levels might not be supported, not to mention the execution speed of queries.

Nevertheless, this two-prong coding approach should result in a single DAL interface, not giving way to any of the particular engine’s behaviour — as SOLID rules state, implementation should be separate from abstraction. One could raise their voice regarding the exposure of functions that are likely to be used only in the test environment (for instance, functions that count the database rows or truncate tables) — I personally support keeping them accessible to the source code anyway to reduce complexity.

Regardless of the architectural decisions undertaken for an entire project, I would expect the DAL to be either a self-contained singleton or a dependency-injected component. Using multiple DALs should be warranted only if there are multiple databases in play, especially if their engines are incompatible with one another. As the DAL is a low-level component, it should be used with components on a similar level to achieve proper separation of concerns.

Components that mix some of the Data-Access-Layer functionality with a pinch of business logic are commonly called services, as they serve as a self-contained building block for more complex components of a system. We can imagine a UserService that exposes all CRUD operations taking into account the necessary privileges for granting access — as the DAL should not be aware of a concept of access restrictions for users of the system.

Repositories

I would consider a repository an object that abstracts away all relevant database operation from the point of view of a single database entity. A full-fledged project is expected to contain multiple repositories in its DAL. Whether the DAL exposes repository objects or functions that act as proxies to proper repository functions is a decision a software developer should make.

CRUD

For each database entity one can distinguish between 4 operations:

  • creation of one entity,
  • reading one and reading many entities,
  • updating one and updating many entities,
  • deleting one and deleting many entities.

It is necessary to recognise the difference between operations over a single and many entities. As an example, how many rows might be be returned here?

SELECT id FROM users WHERE country = "UK";

And how many rows might be affected here?

DELETE FROM users WHERE id = 1234;

Generally speaking, there are no overall guarantees that the returned (or affected) number of rows will be limited, unless:

  • we explicitly limit the number of returned results (engine-specific),
  • the limit itself comes from constraints (e.g. unique indices).

Since the DAL should be made explicitly aware of the aforementioned facts, I lean towards following architectural decisions:

  • the creation functions allow only for a single row creation,
  • the finding functions return a collection (e.g. array, list, set),
  • the updating functions explicitly state that multiple rows might be updated,
  • the deleting functions explicitly state that multiple rows might be deleted.

Compound operations

I wholeheartedly recommend to consider adding two common compound operations:

  • upserting (creating XOR updating),
  • truncating the whole table (deleting all table rows).

The former might be supported by the database engine itself, while the latter might prove useful in the test environment (e.g. when we want to clean the entire database after each test).

Transactions

Some SQL databases support different types of transactions — for SQLite, it is deferred, immediate and exclusive. Additionally, SQL databases traditionally support 4 levels of transaction isolation:

  • serialisable (the highest),
  • repeatable read,
  • read committed,
  • read uncommitted (the lowest).

Being aware of these aforementioned transaction properties should be reflected in decisions regarding the DAL interface. Personally, I believe that the DAL should recognise the concept of a transaction but it should not arbitrarily dictate the type and the isolation level of the transaction used in queries. As an example, a function that returns users from the database should allow a (perhaps optional?) parameter for a transaction reference instead of managing a read-uncommitted exclusive transaction on its own.

If the query functions do not control transactions, the DAL needs to expose an interface to create various transactions, depending on the use case. There are two ways of working with a transaction managed by the backend service:

  • explicit committing and rollbacks,
  • implicit committing on existing a block of instructions and implicit rollback if an exception happens in the aforementioned block.

I find both approaches valid but I have found most successes with the latter, as it allows me to focus more on the functionality instead of transactional intricacies.

Summary

As I have mentioned before, my ideal Data Access Layer should work with both the source code and tests and should have a uniform interface for its implementations. CRUD repository functions and transaction managers ought to be directly accessible from its level. I believe any consumer of the API should have full control over transactions in the system.

--

--