Maintenance on /var

Maintenance

Sooner or later you might find the need to check your filesystem while the system is live. Recently I wanted to check /var (a LVM volume) because of a service not being up.

You could also reboot with a rescue disk/image to perform diskchecks, and there are probably other ways to do it. First, we want to get more info on the volume. Here the lv is part of a volume group vg0:

 file -s /dev/vg0/var
 /dev/vg0/var: symbolic link to ../dm-4

We can use blkid to know the filesystem:

blkid /dev/vg0/var
/dev/vg0/var: LABEL="var" UUID="042ee636-8574-46d4-9d69-6c521bb7d6b6" TYPE="ext4"

Now we know /var is an ext4 volume. Using mount would also have revealed this info:

mount
...
/dev/mapper/vg0-var on /var type ext4 (rw,relatime,data=ordered)

On a live system, go to runlevel 1. Runlevel 1 is a single-user mode for administrative tasks. You can check the current runlevel first before going into runlevel 1:

runlevel
N 5

The first item (N) is the previous runlevel, the last is the current runlevel. N means we have not previously booted in a different runlevel. 5 means the current runlevel is 5. 5 is a normal multi-user mode with networking plus the display manager. We will need to remember the current runlevel when we get the system back to it's normal operation. For maintenance, we bring the system to runlevel 1:

init 1

You need to check no more services are accessing the particular volume. If /var is still mounted, use lsof to find out what processes are preventing an umount:

lsof /var

Kill the processes still accessing the volume but take care what processes are running. For instance, if a database program is running, stop it and recheck /var with lsof.

Next, umount the volume:

umount /var

Perform a filesystem check. In case of ext4:

e2fsck /dev/vg0/var

After the check, you would get the system back to it's normal runlevel, 5 in this case:

init 5

This change didn't work. According to df -h there was still a lot of free room. I tried making a testfile in /var:

touch /var/test

The resulting error is in the language of this system (Dutch) so to get the proper error in English:

export LANGUAGE=en_US.UTF-8; touch /var/test

The resulting error

touch: cannot touch ‘/var/test’: No space left on device

Time to check the inodes!:

export LANGUAGE=en_US.UTF-8;df -i /var

Filesystem          Inodes  IUsed IFree IUse% Mounted on
/dev/mapper/vg0-var 247008 247008     0  100% /var
And there you have it! No more free inodes.

We need to find out what causes these massive number of files:

du /var | sort -k1 -n

...
244     /var/tmp/ntopng/1/top_talkers/2016/10/18/16
244     /var/tmp/ntopng/1/top_talkers/2016/10/18/17
244     /var/tmp/ntopng/1/top_talkers/2016/10/18/18
244     /var/tmp/ntopng/1/top_talkers/2016/10/18/19
244     /var/tmp/ntopng/1/top_talkers/2016/10/18/20
...

It looks like ntopng is causing a lot of entries. Deleting these freed up a lot of inodes:

df -i /var

Filesystem          Inodes IUsed  IFree IUse% Mounted on
/dev/mapper/vg0-var 247008 10636 236372    5% /var

Now /var is back to normal