ext2 File System

It has been a while since I have been able to work on personal projects, mostly because of school. One of the major time sinks was of course ECE 391, where we developed an operating system. It was very fun and I learned a lot but it was very structured, and I wanted to explore and learn more about the tools that we used and go beyond the assignment (yes I know there was extra credit, I had exams to study for).

Anyway, I decided to start working on a new project from scratch which I am calling summerOS, since I am uncreative. The boot process is pretty much the same, but I am trying to make it my own after that. It runs in a bitmapped graphics mode rather than text mode, for example. The biggest thing I wanted to do was have a functioning hard drive.

In MP3, they give us a simple file system image which is kind of like ext1, but even more simplified with no directories other than the main directory and fixed length file names. Additionally, it was loaded as a multiboot module directly into memory so we did not touch the hard drive at all.

For my new project, this was one of the first things I wanted to do. The first step was to stop using grub-mkrescue to burn iso images to boot from and switch to creating full fat hard drive images. I chose MBR and ext2 since I figured those would be the easiest and are still used today. Since you can’t just run a command on a folder to generate an image like that, I needed to write a script. I tried for a bit but I ended up using a lot of the code from the serenity script, but I removed most of the stuff I wasn’t using. The gist of it is that it dd to make the empty file, losetup to mount that empty image, parted to turn it into an MBR image, mke2fs to create an ext2 file system on the first partition, then finally grub-install to install grub to the MBR. Once that is done then I can copy over everything that I want on the disk.

To read data from the drive, I am using ATA PIO Mode, since it is the easiest. The first thing to do is read the MBR, which is always the first sector on the drive. Sidenote, my function to read data from the disk always reads in 256 16 bit words, so any time I want to read into a struct or something I just have it in a union with a uint16_t data[256], I’m not sure if this is the right way to do it but I don’t see a problem with it at the moment. Anyway, the first sector has all the information about the partitions on it. I am only interested in the first partition, so I go to that spot and read the LBA of the first sector in the partition. I know this is where the ext2 system is so I call this ext2 base. Any addresses I read from the file system will be relative to this.

Now that I know where the file system is, I can read some of the information about it. The first bit is the superblock, which contains a bunch of useful information about the disk. Some of the important ones are the block and inode size, and the number of blocks and inodes per group. Right after the super block is group descriptor table. Each group has it’s own inode table and data blocks, its kind of like there are 32 of the simple mp3 file systems.

To read a file, we need to start at the root directory. This is always inode 2 (but inodes start at 1, so it’s actually index 1, why do they have to do this). Since we know how many inodes there are per group, we know which group a given inode will be in, although the root inode will obviously be in the first group. From there we can read the inode table block address of the first group, which will lead us to the list of inodes, where we can read the second one. One thing of note is that blocks are not the same size as sectors, in this case sectors are 512b and blocks are 1024b. Any block address we get has to be multiplied by 2 before being added to the base offset to get the right sector. I read that block into an array of inodes and read the second one, which should be the root directory.

In that inode there are data blocks, indirect data blocks, doubly indirect data blocks, and finally triply indirect data blocks. My root directory is pretty small so all of the entries can fit in the first data block. I read that block address and load it as before. Since the root directory is a directory, its data blocks are full of directory entries. Each directory entry contains the inode number, the size of the entry, the length of the filename, the type, and finally the name. It is important to have the size of the entry as well as the length of the filename since the size of entries is 4 byte aligned and also the directory entries must fill up the entire data block, so the last entry will have a much bigger size so that it all lines up.

If I know the name of the file I want, I can go one by one through each entry and see if the name matches. If it doesn’t, I add the size of that entry to the pointer and now it is pointing at the next entry. Part of the inode is the number of sectors used, so if I multiply that by 512 I know how many bytes the inode uses. If I ever try to read more than that, then I know I have read all of the data that that inode has to offer.

If I find the file that I want, I need to run through the same process again but with my new inode number. To recap,

Find the group the inode is in using the number of inodes per group
Find the inode table address of that group and load it
Get the index of that inode in the table, again using the number of inodes per group.
Read the inode and find where the data blocks are
Finally, load those data blocks.

For a file path, you have to do this for each part. I have been using “/Home/hello.txt” in my testing, so the first step is load the root directory and look for the “Home” entry, and since there is a proceeding slash we know it is a directory. Then look in the data blocks of “Home” to find hello.txt, then load the data blocks from that.

If you want to take a look, check out kernel/Devices/Storage in summerOS on my gitlab here.

also someone remind me to throw a picture or a diagram or something in here

here you go