Avoid Race Condition in CSV exports


Posted: 2017-06-23T03:15:00+05:30 | Source | Tags: Uncategorized




One of the biggest problems that appear while writing any file in the server side is a race condition. A race condition when two or more users make request to write to the same file in the server side and since neither of the operations are completed, so there is a certain problem in the integrity of the data written and may also cause the operation to be completely blocked. So how to stop this and still maintain that a large number of files or memory is not used? Let’s see.


What is a Race Condition?


A race condition is a situation when two or more processes perform a write operation on the same file at the same time. This causes an undesirable situation since the process should be in a sequential manner. So the situation causes something like a ‘race’ between the two server requests to change or modify the same file. So in such a scenario, there is no guarantee that the data written to the file is correct or will be meaningful at the end. Maximum probability is it won’t be.


Avoiding Race Condition


The general way of avoiding a race condition is adding a file lock. We lock a file while it is being written by some user or process and prevent other users from being able to write to the file. They have to wait for the ongoing writing process to complete to start another write process. But making that might increase the wait time quite considerably for users when we are looking to export CSV data about an event.







So to keep the process fast and use the advantage of distributed system, we write the CSV for different request in different files. We create the filenames by appending a random hexadecimal string to the original filename.


Avoid Wastage of Memory


However as we can understand by the proposed method above, there will be a huge wastage of memory since a large amount of files will be created. So to avoid wastage of memory, we run a cron job. The cron job deletes all the csv files after a certain amount of time. For this, we use apscheduler.







Thus, we avoid the race condition while also maintaining that a lot of memory isn’t wasted.