An Elementary Look
Famously, in Unix and Linux, "everything is a file", a digital representation of the information we want to use.
For the Linux computer, it does not matter if a file is a program, a text article, an illustrated novel, a complicated list (database), a Hollywood movie or a beautiful recorded song. While the files are active in RAM, being used in the computer. we only need to worry that the power doesn't go out.
Since, in real life, most of us eat, sleep, work and play, we spend most of our hours away from the computer, and maybe we want to turn off the power while we are away. That means we need some sort of long term storage. RAM is volatile. Stable, long-term storage needs some other sort of hardware. The most well-known such hardware is the hard drive. It's typically a stack of metal disks covered in tiny magnetic particles which can be controlled to make an exact sequence of "on" and "off" patches on the disk surface.
Over the years, computers have had lots of hardware systems designed to provide long-term storage for when the computer is off or when the particular file is not needed now. Dealing with efficient storage is the job of the software we call a file system.
Stable, reliable storage is critical to our ability to use all our files. Most computer users are unaware which file system is being used. It is usually chosen by the developers and packagers of an operating system. Most Linux users, including Ubuntu users, have been automatically provided with the most recent stable version of the EXT file system. The current recommended version is EXT4, and most of us happily let EXT4 work its "magic" without concern.
On disk drives, for example, files get stored as a set of data blocks so a file may be spread out in order to efficiently use the space available on the rapidly spinning disks. Blocks allow a file to grow over time, too. It could be very inefficient to store an expanding file in one contiguous series of blocks. The file system helps keep track of available block space and also where all the blocks of a single file have been put.
[Technical Insert: Linux uses inodes "index of nodes" to track the location of each block.]
Earlier file systems had severe size limits for files and even the length of a file name. MS-DOS, made famous by Microsoft, could only have a name of eight characters and a 3 character extension to determine the type of content in the file. Newer file systems have increased name space along with maximum individual file size and an increasingly useful set of file description information called "metadata". Extensions like .txt or .odt are not really necessary in Linux. The metadata takes care of it. Still, most of us continue to add the extensions for our own convenience.
Solid State (flash) storage, which has become popular for replacing spinning disk hard drives in laptops, along with removable drives like USB sticks need their own filesystem methods. The number of writes/overwrites to a solid state device "space" is critical to the life of the media.
In addition to longer file names, file systems have adjusted to handle newer and larger forms of storage. RAID, for example, creates data duplication using stacks of swappable hard drives attached locally. There are now things designated as pools, and containers which exist in the "cloud". Factor into this discussion that many of these newer storage methods are not really necessary for a personal computer, but suit "Enterprise" needs.
Good file system ideas emerge from many sources. Engineer-programmers who understand the mechanics and physics of storage devices work to make the best, highly efficient software possible. Some work for Canonical, Red Hat, BSD and other free software organizations. Others work for IBM, Oracle, Microsoft, Apple, etc. As a result, the developers of Linux-like systems must try to maintain compatibility with all of them. Whew!
That brings us to Canonical's announcement of an option for Ubuntu 19.10, to activate ZFS as the file system for use by the Linux kernel and the operating system.
ZFS design began around 2005 with ability to address storage of Zettabyte size. It is common to find a hard drive inside a laptop which is 1 terabyte (1 TB = 1000 GB). Unless you are storing lots of video or huge databases, a terabyte handles the needs individual personal computer users in 2019.
Multi-computer shared backup systems, enterprise level organizations and data needs for all sorts of artificial intelligence uses go way beyond a terabyte, of course. The following "helpful" chart shows how far beyond individual needs ZFS can go, four "orders of magnitude" as it is sometimes described.
ZFS appears to be designed to handle situations requiring automatic mirroring (duplication of data like RAID) and easy on-the-fly "snapshots" for another sort of backup. Built-in error correction from mirrors and snapshots can help prevent bitrot. Those features sound nice, but current backup options using EXT4 offer easy answers for individual computer users. If your needs go beyond one computer at a time, it might be time to explore ZFS, say if you run a backup system for General Motors or IBM. Is that you?
Quoting from the online article by Stephen Foskett: (referenced below)
- ZFS achieves the kind of scalability every modern filesystem should have, with few limits in terms of data or metadata count and volume or file size.
- ZFS includes checksumming of all data and metadata to detect corruption, an absolutely essential feature for long-term large-scale storage. When ZFS detects an error, it can automatically reconstruct data from mirrors, parity, or alternate locations.
- Mirroring and multiple-parity “RAID Z” are built in, combining multiple physical media devices seamlessly into a logical volume.
- ZFS includes robust snapshot and mirror capabilities, including the ability to update the data on other volumes incrementally.
- Data can be compressed on the fly and deduplication is supported as well.
Some EXT4 features:
- 1 Exabyte storage maximum (AKA Exbibyte in binary terminology)
- Data blocks incorporated within Extents to provide better contiguous storage of data, improving fragmentation issues.
- Pre-allocation of filespace to reduce block fragmentation
- Backward compatibility with EXT2 and EXT3
- Journaling - a metadata system which assists in tracking file changes over time
Companies like Canonical are trying to serve the needs of a very diverse set of Linux users. By offering the ability to use ZFS, they may stand to gain enterprise users while continuing to serve the home user with EXT4. Of course, these choices are a moving target. It is possible that Canonical and other providers of Linux systems will ensure that ZFS can seamlessly handle our current hard drives and flash drives by incorporating backward compatibilty of EXT4, just the way EXT4 provides backward compatibility with EXT3, etc.
While cache is really a RAM issue, not a filesystem issue, you might find this article interesting: https://www.linuxatemyram.com/