I'm focusing on a database engine for Linux and I've got a question about consistency regarding writing many blocks with one system call towards the kernel. I open the unit with O_DIRECT.
The unit creates data in blocks, with respect to the hardware it may be 512,2048 or 4096. Allows say I'll write 2 blocks of 512 bytes in a single system call. What goes on when the product is shut lower exactly following the disk has written 1 block? Throughout normal operation the write() syscall will return how big the information written, and so i could compare and generate a mistake when 2 values (requested versus came back) mismatch, however with energy shutdown it will get complicated. It's difficult because the kernel might send write demands towards the device not to ensure that you told it to, therefore the tail from the request might be written prior to the mind and you possess a energy off.
Take into account that a database engine creates a transaction log. Allows say a transaction is all about 4096 bytes, the engine will have to write 8 blocks of 512 bytes. All of a sudden we've energy shutdown and just 1 / 2 of the request was written. How databases handle these problems? I guess to obtain for this you'd first have to write the quantity of blocks you want email another location around the disk. When you received correct return value, you might write your computer data. Then, after receiving confirmation, you have to send another email disk upgrading the data that the blocks you desired to create were really written successfuly. So, this could require 3 write procedures, and when the kernel does creates on disk from another processes, this will in all probability lead to 3 seeks. Too inefficient.
I'm searching for a method to acquire a consistent write of numerous blocks with just one write operation to disk. (one write() syscall) Is possible?
Modulo some speed hacks, the write two times behavior you referred to is what databases do. It's known as write-ahead logging, also it involves just one buffer that's designed in consecutive order by procedures, and periodic eliminating of memory buffers to disk having a corresponding flush entry written towards the log. Then, once the database system begins running, it inspections the log for records that won't happen to be flushed to disk, and flushes individuals values to disk (because the records have been in the log).
This could really become more performant than writing the information immediately. The log is really a consecutive file, so appending data into it does not need a seek, only spinning latency. In addition, it's not necessary to write the information towards the actual documents immediately, since always recover it in the log. Then, when there isn't any demands arriving, you flush the information to disk and write a flush admission to the log. By doing this, the only real time the DBMS seeks are 1) once the product is otherwise quiet and a pair of) once the DMBS expires of memory for holding the modified data. As lengthy as you've enough memory inside your machine, the disk without lots of seeks whatsoever, and individuals seeks happen once the DBMS is not busy anyway.
You have to first shutdown your database engine before shutting lower the machine!.. Whenever you shutdown the engine, it will first finish all of the creates, sync (flush the buffers) and produce the engine to some normal halt. You may also incorporate a command in your body shutdown script to seal the engine lower first, before shutting lower the machine.