How recovery app do that?


when i was developing my tool trsh, an xdg trash implementation, it was because an important document or file of mine got accidentally deleted. it’s written in c, so it uses some system functions. one of the functions to remove a file is unlink, yes, a weird name for a function that removes a file.

i did some digging to understand what actually happens when a file is removed. it turns out, the file isn’t immediately deleted. the file system just removes the pointer and metadata, but the actual data remains on disk until it’s overwritten by new data.

this was interesting to me, so i ran an experiment. i created a virtual disk and formatted it using mkfs with the ext4 file system. now i can inspect what happens on that virtual disk.

Create some .img disk

To create raw image i just use dd

$ dd if=/dev/zero of=disk.img bs=1k count=512

what that command do is, use /dev/zero as input, output it to disk.img, with block size of 1K, and length of 512, so it will produce disk.img with size of 512K, i also can produce disk.img for 1M.

$ dd if=/dev/zero of=disk.img bs=1M count=1

we can confirm that the contents of disk.img is zero using hexdump.

$ hexdump -C disk.img 
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00080000

for this experiment i create 5M disk.img

ext4 File system format

after we have raw image disk.img we can format it with mkfs.ext4.

$ mkfs.ext4 disk.img

you’ll see some interesting results when inspecting the image with hexdump:

$ hexdump -C disk.img

mounting disk.img

to mount under linux use mount command with mount option to loop

$ mkdir disk
$ su -c "mount -o loop disk.img disk"

experiment begin

create some file in there, then umount

$ su -c "echo This is Not Secure > disk/secureNot.txt"
$ sync
$ su -c "umount disk"

now, let’s check the hexdump:

$ hexdump -C disk.img
*
000ab800  02 00 00 00 0c 00 01 02  2e 00 00 00 02 00 00 00  |................|
000ab810  0c 00 02 02 2e 2e 00 00  0b 00 00 00 14 00 0a 02  |................|
000ab820  6c 6f 73 74 2b 66 6f 75  6e 64 00 00 0c 00 00 00  |lost+found......|
000ab830  c8 03 0d 01 73 65 63 75  72 65 4e 6f 74 2e 74 78  |....secureNot.tx|
000ab840  74 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |t...............|
000ab850  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000b0800  02 00 00 00 0c 00 01 02  2e 00 00 00 02 00 00 00  |................|
000b0810  0c 00 02 02 2e 2e 00 00  0b 00 00 00 14 00 0a 02  |................|
000b0820  6c 6f 73 74 2b 66 6f 75  6e 64 00 00 0c 00 00 00  |lost+found......|
000b0830  c8 03 0d 01 73 65 63 75  72 65 4e 6f 74 2e 74 78  |....secureNot.tx|
000b0840  74 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |t...............|
000b0850  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00900400  54 68 69 73 20 69 73 20  4e 6f 74 20 53 65 63 75  |This is Not Secu|
00900410  72 65 0a 00 00 00 00 00  00 00 00 00 00 00 00 00  |re..............|
00900420  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00a00000

even though the disk is not mounted, we can still see the content. This is why I encrypt all my disks using LUKS.

remove secretNot.txt

now I’ll try to remove the file:

$ su -c "mount -o loop disk.img disk"
$ su -c "rm -rf ./disk/secretNot.txt"
$ sync
$ su -c "umount disk"
$ hexdump -C disk.img
*
000ab800  02 00 00 00 0c 00 01 02  2e 00 00 00 02 00 00 00  |................|
000ab810  0c 00 02 02 2e 2e 00 00  0b 00 00 00 dc 03 0a 02  |................|
000ab820  6c 6f 73 74 2b 66 6f 75  6e 64 00 00 00 00 00 00  |lost+found......|
000ab830  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000b0800  00 00 00 00 00 00 00 00  47 d3 81 68 47 d3 81 68  |........G..hG..h|
000b0810  47 d3 81 68 00 00 00 00  00 00 00 00 00 00 00 00  |G..h............|
000b0820  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00900400  54 68 69 73 20 69 73 20  4e 6f 74 20 53 65 63 75  |This is Not Secu|
00900410  72 65 0a 00 00 00 00 00  00 00 00 00 00 00 00 00  |re..............|
00900420  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00a00000

as you can see, the filename is gone from the hexdump, but the file content is still there. this applies to any file. i could even write a c program to find a specific magic header and recover the data.