Skip to content

Commit e210d01

Browse files
committed
From Batch to Real-Time: Rethinking File Processing with Linux fanotify
1 parent 1e3680e commit e210d01

File tree

1 file changed

+98
-0
lines changed

1 file changed

+98
-0
lines changed
Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
---
2+
title: "From Batch to Real-Time: Rethinking File Processing with Linux fanotify"
3+
date: 2025-09-18T20:10:00+08:00
4+
categories:
5+
- tech
6+
tags:
7+
- batch
8+
- real-time
9+
- fanotify
10+
---
11+
12+
Traditionally, batch processing of files has been the default approach in many on-premise solutions.
13+
Files are dropped into a directory, collected on a schedule, and processed in groups.
14+
This model works fine for systems where latency is not critical, but once you start asking “How do we make this real-time?” the story becomes more interesting.
15+
16+
At first, my answer was polling. After all, if we want near real-time, we can just keep checking the directory at short intervals.
17+
But this has obvious inefficiencies—extra CPU cycles wasted, unnecessary I/O, and still not truly “real-time.”
18+
19+
That question—"Is there a better way?"—stuck in my mind. If Dropbox, OneDrive, and other software can sync files
20+
immediately when changes happen, there must be a way to achieve this at the server side.
21+
22+
## File Change Notifications in Linux
23+
24+
On Linux, we have two main interfaces for detecting file changes:
25+
26+
. inotify: Provides file system event notifications. It’s commonly used but can have scaling limitations for very large file sets.
27+
28+
. fanotify: A more powerful and efficient alternative, particularly well-suited for monitoring file access and modifications at scale.
29+
30+
For real-time file processing solutions, fanotify is usually the more efficient choice.
31+
32+
## Two Options for Real-Time File Processing
33+
34+
By leveraging fanotify, we can design systems where file changes immediately trigger processing workflows. Below is a simplified view of two options:
35+
36+
1. Message Queue Integration
37+
38+
* File change events trigger a message sent to a queue.
39+
40+
* A consumer reads the message, processes the file, and responds back.
41+
42+
* Response is correlated and sent back to the main system.
43+
44+
1. Direct Method Invocation
45+
46+
* File change event directly calls a service method with the file content.
47+
48+
* The service processes it and returns the response immediately.
49+
50+
Here’s a conceptual diagram:
51+
52+
[plantuml, format="svg",opts="inline"]
53+
----
54+
@startuml
55+
56+
participant mainFrm
57+
participant NFS
58+
59+
mainFrm -> NFS : put file
60+
NFS -> NFS: fanotify
61+
alt mq config
62+
NFS -> MQ
63+
MQ -> Consumer
64+
Consumer ->MQ
65+
MQ -> NFS
66+
NFS-> mainFrm: resp
67+
else method call
68+
NFS -> Serfvice: resp:= service.method1(content)
69+
NFS -> mainFrm: resp
70+
end
71+
@endduml
72+
----
73+
74+
## Additional Considerations
75+
76+
When moving from batch to real-time file processing, a few practical challenges must be addressed:
77+
78+
1. Handling Partial Files
79+
80+
* A file may not be fully written when the notification is triggered.
81+
82+
* Common approaches:
83+
84+
** Use checksums to verify integrity.
85+
86+
** Use marker files (e.g., write file.done after the main file is complete).
87+
88+
1. Communication Protocols via Files
89+
90+
* Establish clear naming conventions.
91+
92+
* Define how to correlate request and response files to avoid mismatches.
93+
94+
## Final Thoughts
95+
96+
Shifting from batch to real-time file processing isn’t just a performance optimisation—it can fundamentally change how applications interact. By leveraging Linux’s fanotify, we can eliminate polling and move closer to truly event-driven file workflows.
97+
98+
The key is to not only handle notifications efficiently but also to design protocols for safe and predictable file exchange. With careful planning around partial file handling and file naming conventions, we can build robust, real-time server-side file processing solutions.

0 commit comments

Comments
 (0)