`
`How your data is stored on disk and memory? | by Guangyuan(Frank) Li | Towards Data Science
`
`Get started
`
`Open in app
`
`Follow
`
`591K Followers
`
`THOUGHTS AND THEORY
`
`How your data is stored on disk and memory?
`Delving a bit into what’s happening under the hood when you manipulate your Excel
`file or data frames.
`
`Guangyuan(Frank) Li Mar 1 · 6 min read
`
`Photo by Christian Wiediger on Unsplash
`https://towardsdatascience.com/how-your-data-is-stored-on-disk-and-memory-8842891da52
`
`1/7
`
`Patent Owner, Bot M8 LLC - Ex. 2058, p. 1
`
`
`
`5/10/2021
`How your data is stored on disk and memory? | by Guangyuan(Frank) Li | Towards Data Science
`I consider myself as an applied programmer, I use high-level programming languages
`like Python to manipulate the data frames, build simple pipelines to facilitate the data
`analysis of certain tasks. However, at the atomic level, the computer itself doesn’t
`understand what “pandas.read_csv()” means or what the “write.table()” function
`actually does. It makes me wonder about the intermediate steps involved in converting
`our commands to 0/1 binary bits and prompts me to write this article.
`
`In this article, I am hoping to give you a rough and intuitive image of what the disk
`actually looks like, how your data lays out on the disk, and what’s the information
`exchanges between your disk and memory. This is not meant to be a strictly technical
`blog but more like relaxing reading material. Being aware of the physical hardware
`structure can help us to get a better understanding of the tasks and problems we may
`encounter in our work:
`
`memory leakage? indexing file? memory overhead? I/O streaming?
`
`So brace up yourself and let’s get into that!
`
`What the disk looks like?
`When you are on your PC or Mac, you open the Computer or Finder, all the files that
`appear there are stored on the disk, sometime the disk is also referred to as storage.
`Nowadays, nearly most personal computers are using Solid State Disk (SSD), however,
`to convey some intuition, I am going to use Hard Disk (HD) to illustrate the physical
`structure of the disk.
`
`https://towardsdatascience.com/how-your-data-is-stored-on-disk-and-memory-8842891da52
`
`2/7
`
`Patent Owner, Bot M8 LLC - Ex. 2058, p. 2
`
`
`
`5/10/2021
`
`How your data is stored on disk and memory? | by Guangyuan(Frank) Li | Towards Data Science
`
`Photo by benjamin lehman on Unsplash
`
`The structure of a typical hard disk looks like a CD player, it has an arm that can rotate
`and changes to a selected track, the head of the arm is able to read, write or erase the
`data stored in each track of the surface.
`
`platter, track, sector, gap, block
`
`The surface is also referred to as a platter, and a platter can be split into a set of physical
`units called a sector. There are gaps between each sector and several sectors constitute a
`block, the block is the logical unit.
`
`If I tell you one track in a platter has a maximum capacity of 1024Kb (1Mb), then you
`know your Excel file, which let’s imagine is 2Mb, will occupy two platters to store it in
`the disk.
`
`If the rotation speed is known, then you can easily compute how fast the files/bits can be
`transferred from the disk to memory or vice versus. It also becomes clear that why a
`
`https://towardsdatascience.com/how-your-data-is-stored-on-disk-and-memory-8842891da52
`
`3/7
`
`Patent Owner, Bot M8 LLC - Ex. 2058, p. 3
`
`
`
`5/10/2021
`How your data is stored on disk and memory? | by Guangyuan(Frank) Li | Towards Data Science
`large file needs more time to read because it needs longer time to rotate the arms and
`arms also take longer to read since the storage area will be bigger.
`
`Solid State Disk (SSD) will have faster accessing time, and the architecture is quite
`different from the hard disk (HD) example that I showed you above. But hopefully, that
`can provide you a clear image that where your files lie on, and next I will show you how
`the data actually lies on the disk.
`
`How the data lays out on the disk?
`In a Database Management System (DBMS), each record (a record is basically a row in a
`data frame) lies on disk in some particular manner to speed up the search, insert, and
`deletion. Imagine if every record is scattered around the disk without any rules or
`organizations, there will be no way to perform the query, indexing, and any
`manipulations.
`
`A single record/data on disk
`
`Simply speaking, you can store all the data in a row-wise manner, meaning to say that
`similar records/rows will be gathered together. In doing so, row-wise indexing can be
`very efficient with the cost of adding column operation or column-wise indexing can be
`
`https://towardsdatascience.com/how-your-data-is-stored-on-disk-and-memory-8842891da52
`
`4/7
`
`Patent Owner, Bot M8 LLC - Ex. 2058, p. 4
`
`
`
`5/10/2021
`How your data is stored on disk and memory? | by Guangyuan(Frank) Li | Towards Data Science
`relatively slow. If you go along the column-wise manner, things would become exactly
`the opposite.
`
`Let’s imagine each row will be stored in a coalesced space on disk, as shown in the
`figure. The physical address or pointer is associated with this record. In addition to that,
`a search key is usually present to serve as a proxy in the index file. An index file is
`basically pairs of search key and pointer combinations. So having a search key aims to
`expedite the searching and avoid the overhead. With the presence of the search keys,
`scientists have developed a wide array of efficient schemas to represent and organize all
`the search keys. Prominent examples include:
`
`1. B+ tree: self-balanced tree such that the distance from the root to all leaf nodes is
`equal.
`
`2. B tree: Similar to B+ tree but it has record stored on non-leaf nodes
`
`3. Extensible hashing: a directory together with buckets, buckets store the hashing
`key computed by hash table or hash function.
`
`4. Linear hashing: Similar to extensible hashing without directory but multiple levels
`of hash functions.
`
`Since it is not a technical blog, I would like to refer you to some wonderful youtube
`videos (link above) that explain what each indexing schema is and how they enable
`quick and efficient insertion and deletion. But the take-home message from this section
`is, data is laid out on disk in a well-designed manner that enables us to efficiently
`fetch and manipulates them.
`
`How to read bytes/files from disk to memory?
`First, I am hoping to instill a little bit of non-intuitive fact that a file is actually a stream
`of bits/bytes. When you read a file into memory, you are dealing with a series of 0/1s
`and you are hoping to load a specific portion of files correctly.
`
`https://towardsdatascience.com/how-your-data-is-stored-on-disk-and-memory-8842891da52
`
`5/7
`
`Patent Owner, Bot M8 LLC - Ex. 2058, p. 5
`
`
`
`5/10/2021
`
`How your data is stored on disk and memory? | by Guangyuan(Frank) Li | Towards Data Science
`
`fseek and read function to read file streams to memory
`
`Now we have to use some low-level language like C:
`
`# declare file pointer
`File *fp;
`# initialize file pointer with a binary file stored on disk
`fp = fopen('./test_file.bin','r');
`# set the reading pointer from starting point (SEEK_SET) to 4 bits
`forward (offset = 4)
`fseek(fp,4,SEEK_SET);
`# declare a pointer to store the file in memory
`ph = (char) malloc(128);
`# read the file, read 128 bits to the pointer in memory
`fread(ph,sizeof(char),128,fp)
`# free the memory
`free(ph)
`
`As shown in the figure and the code snippet, fseek function has a built-in variable
`SEEK_SET which is the starting position of the file stream. We set offset=4 to instruct
`the function to read from the position that 4 bits ahead of the SEEK_SET . Then we
`declare a char pointer in memory, using fread function to specifically read into 128 bits
`of file stream into a specific memory space that we just defined. This is how reading
`from disk to memory actually works.
`
`Also, make sure to allocate the memory space first by using malloc function in C. It is
`also important to release the memory space whenever you manually allocate the space
`before, using free function. Neglecting this step can cause a memory leakage problem
`which is caused by not-in-use memory that hasn’t been correctly released properly.
`
`Conclusion
`Like I said in the beginning, in most cases, the low-level realization is not our main
`concern as an applied programmer. However, I always think it is not an excuse to stop us
`
`https://towardsdatascience.com/how-your-data-is-stored-on-disk-and-memory-8842891da52
`
`6/7
`
`Patent Owner, Bot M8 LLC - Ex. 2058, p. 6
`
`
`
`5/10/2021
`How your data is stored on disk and memory? | by Guangyuan(Frank) Li | Towards Data Science
`from learning a bit about the underpinnings of the computer system. I hope you find this
`article interesting and useful, thanks for reading! If you like this article, follow me on
`medium, thank you so much for your support. Connect me on my Twitter or LinkedIn,
`also please let me know if you have any questions or what kind of tutorials you would
`like to see In the future!
`
`Sign up for The Variable
`By Towards Data Science
`
`Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials
`and cutting-edge research to original features you don't want to miss. Take a look.
`
`Get this newsletter
`
`You'll need to sign in or create an account to receive this
`newsletter.
`
`Programming
`
`Data Science
`
`Python
`
`Computer Science
`
`Thoughts And Theory
`
`Get the Medium app
`
`About Help Legal
`
`https://towardsdatascience.com/how-your-data-is-stored-on-disk-and-memory-8842891da52
`
`7/7
`
`Patent Owner, Bot M8 LLC - Ex. 2058, p. 7
`
`