|
| 1 | +--- |
| 2 | +title: Reason for Birth |
| 3 | +date: 2025-02-24 17:08:33 |
| 4 | +author: loongs-zhang |
| 5 | +--- |
| 6 | + |
| 7 | +# Reason for Birth |
| 8 | + |
| 9 | +English | [中文](../cn/background.md) |
| 10 | + |
| 11 | +## The thread pool needs to be optimized |
| 12 | + |
| 13 | +In the early days, developers often adopted multiprocessing to support concurrent access to service applications by |
| 14 | +multiple users, which creates a service process for each TCP connection. Around 2000, it was quite popular to use CGI to |
| 15 | +write web services, and the most commonly used web server at that time was Apache 1.3.x series, which was developed |
| 16 | +based on the multiprocessing model. Because processes occupy more system resources while threads occupy fewer resources, |
| 17 | +people have started using multithreaded (usually using thread pools) to develop web service applications, which has |
| 18 | +increased the user concurrency supported by a single server, but there is still a problem of resource waste. |
| 19 | + |
| 20 | +In 2020, I joined the V company. Due to occasional occurrences of the thread pool being fully filled in the internal |
| 21 | +system, coupled with the fact that the leader had |
| 22 | +read [《Java线程池实现原理及其在美团业务中的实践》](https://tech.meituan.com/2020/04/02/java-pooling-pratice-in-meituan.html), |
| 23 | +we decided to build our own dynamic thread pool. From the process, the results were good: |
| 24 | + |
| 25 | +<div style="text-align: center;"> |
| 26 | + <img src="/docs/img/begin.jpg" width="50%"> |
| 27 | +</div> |
| 28 | + |
| 29 | +But this don't fundamentally solve the problem. As is well known, thread context switching has a certain cost, and the |
| 30 | +more threads there are, the greater the cost of thread context switching. For CPU intensive tasks, simply ensure that |
| 31 | +the number of threads is equal to the number of CPU cores and bind the threads to the specified CPU core (hereinafter |
| 32 | +referred to as the `thread-per-core`), it can ensure optimal performance. For IO intensive tasks, since the task almost |
| 33 | +always blocks threads, the cost of thread context switching is generally less than the blocking cost. However, when the |
| 34 | +number of threads is too large, the cost of thread context switching will be greater than the blocking cost. |
| 35 | + |
| 36 | +The essence of dynamic thread pool is to adjust the number of threads to minimize the cost of thread context switching |
| 37 | +compared to blocking. Since this is manual, it cannot be guaranteed. |
| 38 | + |
| 39 | +<div style="text-align: center;"> |
| 40 | + <img src="/docs/img/run.jpg" width="50%"> |
| 41 | +</div> |
| 42 | + |
| 43 | +## The pain of using NIO |
| 44 | + |
| 45 | +Is there a technology that can perform IO intensive tasks with performance comparable to multithreading while ensuring |
| 46 | +thread-per-core? The answer is `NIO`, but there are still some limitations or unfriendly aspects: |
| 47 | + |
| 48 | +1. The NIO API is more complex to use compared to the BIO API; |
| 49 | +2. System calls such as sleep still block threads. To achieve optimal performance, it is equivalent to disabling all |
| 50 | + blocking calls, which is unfriendly to developers; |
| 51 | +3. In the thread pool mode, for a single thread, the next task can only be executed after the current task has been |
| 52 | + completed, which cannot achieve fair scheduling between tasks; |
| 53 | + |
| 54 | +Note: Assuming a single thread with a CPU time slice of 1 second and 100 tasks, the fair scheduling refers to each task |
| 55 | +being able to fairly occupy a 10ms time slice. |
| 56 | + |
| 57 | +The first point can still be overcome, while the second and third points are weaknesses. In fact, if the third point can |
| 58 | +be implemented, RPC frameworks don't need to have too many threads, just thread-per-core. |
| 59 | + |
| 60 | +How can developers use it easily while ensuring that the performance of IO intensive tasks is not inferior to |
| 61 | +multi threading and thread-per-core? The `Coroutine` technology slowly entered my field of vision. |
| 62 | + |
| 63 | +## Goroutine still has shortcomings |
| 64 | + |
| 65 | +At the beginning of playing with coroutines, due to the cost of learning, I first chose `kotlin`. However, when I |
| 66 | +realized that kotlin's coroutines needed to change APIs (such as replacing Thread.sleep with kotlinx.coroutines.delay) |
| 67 | +to avoid blocking threads, I decisively adjusted the direction to `golang`. About 2 weeks later: |
| 68 | + |
| 69 | +<div style="text-align: center;"> |
| 70 | + <img src="/docs/img/good.jpeg" width="50%"> |
| 71 | +</div> |
| 72 | + |
| 73 | +Which technology is strong in coroutine? Look for Golang in program languages. However, as I delved deeper into my |
| 74 | +studies, I discovered several shortcomings of goroutines: |
| 75 | + |
| 76 | +1. `Not thread-per-core`. The goroutine runtime is also supported by a thread pool, and the maximum number of threads in |
| 77 | + this thread pool is 256, which is generally much larger than the number of threads in the thread-per-core, and the |
| 78 | + scheduling thread is not bound to the CPU; |
| 79 | +2. `Preemptive scheduling will interrupt the running system calls`. If the system call takes a long time to complete, it |
| 80 | + will obviously be interrupted multiple times, resulting in a decrease in overall performance; |
| 81 | +3. `There is a significant gap between goroutine and other in best performance`. Compared to the C/C++ coroutine |
| 82 | + library, its performance can even reach 1.5 times that of goroutines; |
| 83 | + |
| 84 | +With regret, I continued to study the C/C++ coroutine libraries and found that they either only implemented `hook` (here |
| 85 | +we explain hook technology, in simple terms, proxy system calls, such as calling sleep. Without the hook, the operating |
| 86 | +system's sleep function would be called, and after the hook, it would point to our own code. For detailed operation |
| 87 | +steps, please refer to Chapters 41 and 42 of The Linux Programming Interface), or only implemented `work-stealing`. |
| 88 | +Some libraries only provided the most basic `coroutine abstraction`, and the most disappointing thing is that none of |
| 89 | +then implemented `preemptive scheduling`. |
| 90 | + |
| 91 | +There's no other way, it seems like we can only do it ourselves. |
| 92 | + |
| 93 | +<div style="text-align: center;"> |
| 94 | + <img src="/docs/img/just_do_it.jpg" width="100%"> |
| 95 | +</div> |
0 commit comments