Edited cover image from Microsoft Research DNA Data Storage

Optimizing the Future of Data Storage

Batch Optimization of DNA-Based Archival Storage Systems

Okezue Bell
11 min readJul 11, 2021

This is a technical article. I’d suggest you’d know 2+2, and bio, applied math, and comp sci! But don’t let that intimidate you; anyone can learn. Ultimately, feel free to approach with any level of knowledge, and search for what you don’t understand! Or, you can ask me in the chat for clarifications. To see diagrams in more detail, click on them.

As humanity continues to scale up the amount of collective knowledge being stored on the cloud, there will be a vastly increasing demand for more powerful data storage methods. Unfortunately, we’re encountering a major Rubicon that needs to be crossed, as our traditional storage methods are becoming obsolete.

Most recent data suggest that there currently exists an estimated 8.4 million exabytes of information. As the digital cloud continues to expand as we amalgamate more advanced technologies into our network, and implement more complex architectures for data storage (i.e. blockchain and edge computing paradigms), the information will become intractable.

The quagmire with drives is that there is minimal scalability with large physical storage systems, and their capacity is delimited by their electronic limitations, as typical hard disk drives and computational storage leverages ferromagnetic materials on opposite sides of a disk system (Figure 1).

Figure 1: A pictorial representation of the magnetic cross-sections to frequency modulation of encoded binary data being stored in a modern disk drive.

As shown, there is a sequential variation in the magnetization vector’s direction, which represents a basic binary unit (a bit). Ultimately, there is a fundamental issue with a 0 or 1 being a storage unit, primarily that only having two modes of information leads to more complex data structures being unnecessarily long and difficult to decode.

In HDDs, data is extracted by reading magnetization transitions, meaning it’s relatively easy to lose the data by it being damaged, or the magnetization method malfunctioning. Typically, a parameterizing encoding scheme is used to cipher the data representations as they relate to magnetic transitions, RLL with differential Manchester Coding being…



Okezue Bell

Social technologist with a passion for journalism and community outreach.