This document is a partial comparison of Linux kernels 2.2.18 and 2.4.0 focusing on changes in filesystem code. Kernel version references are found in endnotes. Please send any thoughts regarding errors or improvements to Jay Miller.
Date | Version | Author |
---|---|---|
2001-02-19 | v0.3 | Jay Miller (jnmiller, cryptofreak dot org) |
Changes: | Conversion to HTML. | |
2001-01-23 | v0.2 | Jay Miller (jnmiller, cryptofreak dot org) |
Changes: | Added kernel version endnotes and a few VFS ops additions. | |
2001-01-19 | v0.1 | Jay Miller (jnmiller, cryptofreak dot org) |
Changes: | Initial release. | |
Module initialization is now handled differently.[1] The old method (for a fake fs called 'myfs') is shown here:
static struct file_system_type myfs_fs_type = { "myfs", FS_REQUIRES_DEV, myfs_read_super, NULL }; __initfunc(int init_myfs_fs(void)) { return register_filesystem(&myfs_fs_type); } #ifdef MODULE EXPORT_NO_SYMBOLS; int init_module(void) { return init_myfs_fs(); } void cleanup_module(void) { unregister_filesystem(&myfs_fs_type); } #endif
In addition, a MOD_INC_USE_COUNT;
call is required in the FSD's
read_super()
function and corresponding decrement calls
(MOD_DEC_USE_COUNT;
) are required in put_super()
.
The new method is shown below:
static DECLARE_FSTYPE_DEV(myfs_fs_type, "myfs", myfs_read_super); static int __init init_myfs_fs(void) { return register_filesystem(&myfs_fs_type); } static void __exit exit_myfs_fs(void) { unregister_filesystem(&myfs_fs_type); } EXPORT_NO_SYMBOLS; module_init(init_myfs_fs); module_exit(exit_myfs_fs);
MOD_XXX_USE_COUNT
is now handled by the VFS during filesystem
registration. (ie. don't use them anymore with FSDs)
LFS (Large File Support)
The VFS now supports 64-bit files (x86 and Sparc only).[2]
loff_t
rlimit(2)
system call yetgetrlimit64(2)
and setrlimit64(2)
but wraps too large values to RLIMIT_INFINITY
.For more complete information on LFS support (including the source of the
above info), head to
Andreas Jaeger's LFS page.
New error handling
People are trying to move to better error handling now that the functions
ERR_PTR()
, PTR_ERR()
and IS_ERR()
have been changed a bit:
Old: | if (!dir || !dir->i_nlink) { |
New: | if (!dir || !dir->i_nlink) |
The dynamically tunable fs parameters nr_files,
nr_free_files
,
and max_files
are now part of a new structure:[3]
struct files_stat_struct { int nr_files; int nr_free_files; int max_files; };
New inode flags: S_SYNC
, S_NOATIME
(to replace the need to use
mount flags MS_SYNCHRONOUS
and MS_NOATIME
in inodes) and
S_DEAD
[4] for a removed but still open directory
(and IS_DEADDIR()
to check).
And some new global filesystem flags:[5]
FS_SINGLE
: Filesystem that can have only one superblockFS_NOMOUNT
: Never mount from userlandFS_LITTER
: Keeps the tree in dcacheThe file_operations
pointer has been moved from the inode_operations
structure to the actual inode
structure.[6]
New: | struct file_operations *i_fop; |
Referenced with something like:
inode->i_fop;
Also, the count on the inode is now of type atomic_t
:[7]
Old: | int i_count; |
New: | atomic_t i_count; |
This type's definition is architecture dependent, so there is a special way to
access these variables. To read the variable atom
or set it equal to
value
, respectively, use these functions:
atomic_read(atom); atomic_set(atom, value);
See also the sections on the file structure and the dentry structure.
The count on the file is now of type atomic_t
:[8]
Old: | int f_count; |
New: | atomic_t f_count; |
See the section on the inode structure for instructions on modifying this variable.
The count on the inode is now of type atomic_t
:[9]
Old: | int d_count; |
New: | atomic_t d_count; |
See the section on the inode structure for instructions on modifying this variable.
The d_delete
function now returns an int
measuring
the success of the call:[10]
Old: | void (*d_delete)(struct dentry *); |
New: | int (*d_delete)(struct dentry *); |
And d_alloc_root
has changed in the following way:[11]
Old: | struct dentry *d_alloc_root(struct inode *, struct dentry *); |
New: | struct dentry *d_alloc_root(struct inode *); |
A new unsigned char
argument for the filldir
helper
function:[12]
Old: | typedef int (*filldir_t)(void *, const char *, int, off_t, ino_t); |
New: | typedef int (*filldir_t)(void *, const char *, int, off_t, ino_t, unsigned); |
The new argument is meant to be one of the following file type constants:
DT_UNKNOWN | DT_FIFO |
DT_CHR | DT_DIR |
DT_BLK | DT_REG |
DT_LINK | DT_SOCK |
DT_WHT |
All operations structures are specified differently now.[13] The old structure form (again, for our fake filesystem) might have looked like:
struct file_operations myfs_file_operations = { myfs_file_lseek, generic_file_read, generic_file_write, NULL, NULL, myfs_ioctl, NULL, };
Now you can use:
struct file_operations myfs_file_operations = { llseek: myfs_file_lseek, read: generic_file_read, write: generic_file_write, ioctl: myfs_ioctl, };
This is using a GNU C language extension that is actually made obsolete by the ISO C99 standard. C99 designated initializers look something like this:
struct foo { int foo; long bar; }; struct foo x = { .bar = 3, .foo = 4 };The GNU C extension we use in the fs code is called 'labeled initializer elements'. gcc supports both the extension (duh) and the C99 compatible
.member
syntax.[*]
Added a new argument to one function: if set, don't bother flushing timestamps (see the Miscellaneous section for more on this function).[14]
Old: | int (*fsync) (struct file *, struct dentry *); |
New: | int (*fsync) (struct file *, struct dentry *, int); |
Two operations have been removed from this structure:[15]
Old: |
int (*check_media_change) (kdev_t dev); |
These functions where moved to a new structure:
struct block_device_operations { int (*open) (struct inode *, struct file *); int (*release) (struct inode *, struct file *); int (*ioctl)(struct inode *, struct file *, unsigned, unsigned long); int (*check_media_change)(kdev_t); int (*revalidate) (kdev_t); };
This structure is referenced through a new member now found in the
inode
structure:
New: | struct block_device *i_bdev; |
Which in turn has a pointer to the operations structure as shown here:
struct block_device { struct list_head bd_hash; atomic_t bd_count; dev_t bd_dev; atomic_t bd_openers; const struct block_device_operations *bd_op; struct semaphore bd_sem; };
Two further operations have been added.[16] They
can be called without the big kernel lock held in all filesystems. They
implement the readv(2)
and writev(2)
system calls.
Old: |
ssize_t (*readv) (struct file *, const struct iovec *, unsigned long, loff_t *); |
The file_operations
pointer has been moved from the
inode_operations
structure to the actual inode
structure.
follow_link
is changed:[17]
Old: | struct dentry * (*follow_link) (struct dentry *, struct dentry *, unsigned int); |
New: | int (*follow_link) (struct dentry *, struct nameidata *); |
The first argument remains the same, while a new structure contains the previous final two arguments. This new structure looks like:
struct nameidata { struct dentry *dentry; struct vfsmount *mnt; struct qstr last; unsigned int flags; int last_type; };
TODO: describe this structure?
This is all part of a rewrite of the symbolic link handling. These are the rules (and the order in which they are applied):
Two new functions now appear:[18]
New: |
int (*setattr) (struct dentry *, struct iattr *); |
These functions (really just setattr()
) replace the old
superblock operation notify_change()
. Their use is just
as it used to be. In addition, the following five functions have all
disappeared, but see the section on caches, because
they've really only moved:[19]
Old: |
int (*readpage) (struct file *, struct page *); |
The write_inode
function has added an extra parameter:[20]
Old: | void (*write_inode) (struct inode *); |
New: | void (*write_inode) (struct inode *, int); |
The added parameter is a boolean flag used to decide whether to sync the
inode to disk. See also write_new_inode()
, in the
Miscellaneous section.
OTOH, statfs()
lost a parameter because the size of the
statfs
structure is not needed.[21]
Old: | int (*statfs) (struct super_block *, struct statfs *, int); |
New: | int (*statfs) (struct super_block *, struct statfs *); |
Finally, one function has been removed:[22]
Old: | int (*notify_change) (struct dentry *, struct iattr *); |
This functionality now appears in the inode_operations
structure as getattr()
and setattr()
. See also the
section on inode operations.
alloc_block()
, alloc_inode()
and
transfer()
all no longer require the uid as an argument:[23]
Old: | int (*alloc_block) (const struct inode *, unsigned long, uid_t, char); |
New: | int (*alloc_block) (const struct inode *, unsigned long, char); |
Old: | int (*alloc_inode) (const struct inode *, unsigned long, uid_t); |
New: | int (*alloc_inode) (const struct inode *, unsigned long); |
Old: | int (*transfer) (struct dentry *, struct iattr *, uid_t); |
New: | int (*transfer) (struct dentry *, struct iattr *); |
Old: | int do_truncate(struct dentry *, unsigned long); |
New: | int do_truncate(struct dentry *, loff_t);[24] |
fsync()
added a new argument: if set, don't bother flushing
timestamps (see the section on file operations).[25]
Old: | int file_fsync(struct file *, struct dentry *); |
New: | int file_fsync(struct file *, struct dentry *, int); |
iget()
now takes the place of the old iget_in_use()
function:[26]
Old: | struct inode *iget_in_use(struct super_block *, unsigned long); |
New: | static inline void __iget(struct inode *); |
write_inode_now()
requires a 'sync' flag. (a la
write_inode()
)[27]
Old: | void write_inode_now(struct inode *); |
New: | void write_inode_now(struct inode *, int); |
The old buffer cache is still used for metadata, but use has changed a bit:[28]
Old: | void mark_buffer_dirty(struct buffer_head *, int); |
New: | void mark_buffer_dirty(struct buffer_head *); |
The page cache now handles file-content data by replacing two inode members with a third:[29]
Old: |
unsigned long i_nrpages;
struct list_head i_pages; |
New: | struct address_space i_data; |
It is a generic page cache, and each group of pages belonging to an object is
described by an address_space
structure:
struct address_space { struct list_head clean_pages; struct list_head dirty_pages; struct list_head locked_pages; unsigned long nrpages; struct address_space_operations *a_ops; struct inode *host; struct vm_area_struct *i_mmap; struct vm_area_struct *i_mmap_shared; spinlock_t i_shared_lock; };
host
is a pointer to the object that is the owner of these
pages, like an inode or a block device. i_mmap
and
i_mmap_shared
are pointers to private and public mappings,
respectively. i_shared_lock
is a spinlock protecting the
address space. a_ops
is a pointer to a new list of function
pointers, the address_space_operations
:[30]
struct address_space_operations { int (*writepage)(struct page *); int (*readpage)(struct file *, struct page *); int (*sync_page)(struct page *); int (*prepare_write)(struct file *, struct page *, unsigned, unsigned); int (*commit_write)(struct file *, struct page *, unsigned, unsigned); int (*bmap)(struct address_space *, long); };
These functions used to reside in the inode_operations
structure. (See the inode_operations section)
For more information on the 2.4 kernel, you might try any of the following links.