when i was developing my tool trsh
, an xdg trash implementation, it
was because an important document or file of mine got accidentally
deleted. it’s written in c, so it uses some system functions. one of
the functions to remove a file is unlink
, yes, a weird name for a
function that removes a file.
i did some digging to understand what actually happens when a file is removed. it turns out, the file isn’t immediately deleted. the file system just removes the pointer and metadata, but the actual data remains on disk until it’s overwritten by new data.
this was interesting to me, so i ran an experiment. i created a
virtual disk and formatted it using mkfs
with the ext4 file
system. now i can inspect what happens on that virtual disk.
.img
diskTo create raw image i just use dd
$ dd if=/dev/zero of=disk.img bs=1k count=512
what that command do is, use /dev/zero
as input, output it to
disk.img
, with block size of 1K, and length of 512, so it will
produce disk.img
with size of 512K
, i also can produce disk.img
for 1M
.
$ dd if=/dev/zero of=disk.img bs=1M count=1
we can confirm that the contents of disk.img
is zero using hexdump.
$ hexdump -C disk.img
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00080000
for this experiment i create 5M disk.img
after we have raw image disk.img
we can format it with mkfs.ext4
.
$ mkfs.ext4 disk.img
you’ll see some interesting results when inspecting the image with hexdump
:
$ hexdump -C disk.img
disk.img
to mount under linux use mount
command with mount option to loop
$ mkdir disk
$ su -c "mount -o loop disk.img disk"
create some file in there, then umount
$ su -c "echo This is Not Secure > disk/secureNot.txt"
$ sync
$ su -c "umount disk"
now, let’s check the hexdump:
$ hexdump -C disk.img
*
000ab800 02 00 00 00 0c 00 01 02 2e 00 00 00 02 00 00 00 |................|
000ab810 0c 00 02 02 2e 2e 00 00 0b 00 00 00 14 00 0a 02 |................|
000ab820 6c 6f 73 74 2b 66 6f 75 6e 64 00 00 0c 00 00 00 |lost+found......|
000ab830 c8 03 0d 01 73 65 63 75 72 65 4e 6f 74 2e 74 78 |....secureNot.tx|
000ab840 74 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |t...............|
000ab850 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000b0800 02 00 00 00 0c 00 01 02 2e 00 00 00 02 00 00 00 |................|
000b0810 0c 00 02 02 2e 2e 00 00 0b 00 00 00 14 00 0a 02 |................|
000b0820 6c 6f 73 74 2b 66 6f 75 6e 64 00 00 0c 00 00 00 |lost+found......|
000b0830 c8 03 0d 01 73 65 63 75 72 65 4e 6f 74 2e 74 78 |....secureNot.tx|
000b0840 74 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |t...............|
000b0850 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00900400 54 68 69 73 20 69 73 20 4e 6f 74 20 53 65 63 75 |This is Not Secu|
00900410 72 65 0a 00 00 00 00 00 00 00 00 00 00 00 00 00 |re..............|
00900420 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00a00000
even though the disk is not mounted, we can still see the content. This is why I encrypt all my disks using LUKS.
secretNot.txt
now I’ll try to remove the file:
$ su -c "mount -o loop disk.img disk"
$ su -c "rm -rf ./disk/secretNot.txt"
$ sync
$ su -c "umount disk"
$ hexdump -C disk.img
*
000ab800 02 00 00 00 0c 00 01 02 2e 00 00 00 02 00 00 00 |................|
000ab810 0c 00 02 02 2e 2e 00 00 0b 00 00 00 dc 03 0a 02 |................|
000ab820 6c 6f 73 74 2b 66 6f 75 6e 64 00 00 00 00 00 00 |lost+found......|
000ab830 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000b0800 00 00 00 00 00 00 00 00 47 d3 81 68 47 d3 81 68 |........G..hG..h|
000b0810 47 d3 81 68 00 00 00 00 00 00 00 00 00 00 00 00 |G..h............|
000b0820 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00900400 54 68 69 73 20 69 73 20 4e 6f 74 20 53 65 63 75 |This is Not Secu|
00900410 72 65 0a 00 00 00 00 00 00 00 00 00 00 00 00 00 |re..............|
00900420 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00a00000
as you can see, the filename is gone from the hexdump, but the file content is still there. this applies to any file. i could even write a c program to find a specific magic header and recover the data.