-
Couldn't load subscription status.
- Fork 23
Single-Hart O(1) Enhancement #23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate with proper statistics.
include/sys/task.h
Outdated
|
|
||
| /* Scheduler attribution */ | ||
| typedef struct sched { | ||
| volatile uint32_t ready_bitmap; /* 8-bit priority bitmap */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this variable set to volatile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ISR might modify this value, so I added volatile to ensure bitmap always read directly from memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit doubtful whether this is the right approach, since it sounds like you're using volatile to handle a synchronization problem. Even with the volatile keyword, accesses to the variable are not guaranteed to be atomic, so a race condition can still occur. Should we reconsider how synchronization is handled here? Maybe something like CRITICAL_ENTER/LEAVE() would be more appropriate.
Reference: https://docs.kernel.org/process/volatile-considered-harmful.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate your feedback and agree with your point of view. CRITICAL_ENTER()/LEAVE() method is a better approach to protect bitmap correctness. I'll make sure the bitmap manipulated in the critical section.
| CRITICAL_LEAVE(); | ||
| return ERR_TASK_CANT_SUSPEND; | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid adding or removing blank lines unnecessarily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rebase your branch to keep the git history clean.
include/sys/task.h
Outdated
|
|
||
| /* Scheduler attribution */ | ||
| typedef struct sched { | ||
| uint32_t ready_bitmap; /* 8-bit priority bitmap */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, the comment says this is an 8-bit bitmap, but it's declared as uint32_t?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will there be more priority level extensions in the future? If not, I'll modify this declaration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if it's necessary in the future, we can easily come back and modify this.
|
Regarding your question in the PR description about the use of sched_t and the changes to kcb_t: I'm not sure why you split kcb_t and sched_t into two separate structures, and I don't see an explanation for this in the commit message. From my perspective, both seem to store data needed by the scheduler, though perhaps you had other reasons or considerations in mind? |
|
Hi, @visitorckw, The reason I split data structure |
a2ebd5e to
e43cc5a
Compare
|
I don't have a strong opinion on the implementation, but I have a question about the design. It appears there might be some redundancy. sched_t stores a per-hart ready_queue, while kcb_t retains the tasks list and task_current. When I looked for an explanation for this separation, I couldn't find one. Could we clarify the design? It seems we should clearly define what data is per-hart and belongs in sched_t, versus what data is shared among all harts and should be in kcb_t. |
|
This design also raises a question about task affinity. If we add a task to a specific hart's queue, is the intention for it to be bound to that hart, or is there a mechanism for tasks to be rebalanced across different hart queues later? |
|
Hi @visitorckw , Thanks for your feedback, and let me clarify my original design.
Take one heart as an example, below diagram illustrates my design more clearly: |
But since we expect to modify current_task in kcb_t into an array where each hart has its own entry, would it be more like per-hart information that belongs in sched_t? I do see that you have indeed added a current_task field in sched_t, but what are the respective roles of it and task_current in kcb_t? |
Yes, it do a performance issue. I plan tasks bound into each hart. And I will added For rebalanced problem across different hart, I think I will use IPI mechnism to rearrange tasks belonging. But the implementation detail still need be discussed. Currently, I'm focus on experiment design to prove new design is available and better than current O(n) RR scheduler. And if you think my ideal is achieveable, I'll keep working on validation and will revise code based on what we have dissussed previously. |
You are correct, I tried to context switch based each hart, which means I won't access kcb in per-hart context switch. But I found in #6, your design is more backward compatible and more efficient. I'll modify |
I'm still not sure if it's necessary to maintain separate task lists in both kcb_t and sched_t. Consider an API like find_task_by_id(), which needs to iterate over every task across all harts. To do this safely, we must prevent any hart from modifying its ready queue during the iteration to avoid race conditions. This means we would need a mechanism to lock all queues anyway. If that's the case, we can simply iterate through each per-hart list from sched_t without needing the global list in kcb_t, like this: for (i = 0; i < hart_nums; i++) {
list_for_each(...) {
if (node->id == id) {
return node;
}
}
}This approach seems to work. Am I missing a more complex use case that justifies keeping the global task list in kcb_t? |
I gauss that you mean the list in kcb_t->task_list // Redundant? Split into local_hart_task_list.
sched_t{
list local_hart_task_list; // Originally maintained by kcb->task_list, now split and maintained by local_hart_task_list
list local_hart_ready_queue; // Ready queue with tasks state of TASK_RUNNING and TASK_READY
...
} |
|
I'm just thinking that each task should only appear in one list at a time, and never be duplicated across two different lists. For running and ready tasks, using a per-hart data structure seems appropriate, as it would be efficient for the scheduler on each hart. I'm less certain about tasks in other states. It's unclear to me whether placing them in a single global list or in per-hart lists would be more suitable. This decision likely depends on how we intend to handle task affinity. |
Yes, but I think there still needs other list to handle remaining tasks with different task states. And the ready queue needs external node to linked which task state is READY or RUNNING. I will suggest In order to solve task affinity, I think maybe we can add |
I thought a running task would not appear in the ready queue, but instead each hart would keep track of its own currently running task node. Sorry if I misunderstood your description — could you clarify or correct my misunderstanding?
I am not necessarily against this, but I don't see why we need to keep the task id in sorted order or what benefit we gain from it. Do we have any feature that depends on maintaining this ordering?
No, I didn't mean adding a feature that lets the user specify which hart a task must run on — that should be relatively easy. What I meant is that the scheduler decides when a task will be placed onto a particular hart's ready queue, which determines which hart it will run on. |
Once the RUNNING task need back to READY, there must process "enqueue" function to ensure it back to the last of ready queue. This is redundant and cycle costly process because it will use To solve this issue, there is a pointer tracing the current RUNNING task and this pointer will go to next node use API
When the task want to be deleted, I think each hart can just delete task from local ready queue without further operation if tasks list held at global. |
I don't think this is the reason why we must keep the running task in the ready queue. If we are really concerned about the efficiency of list_push_back, we can simply add a pointer to keep track of both ends of the list to make the insertion operation have an O(1) time complexity.
Still confused. I still don't see the reason why the list needs to be kept sorted. At the same time, I'm also not quite clear on why this approach allows us to delete a task by only removing it from each hart's queue. |
Thanks for your suggestion, I'll follow this idea.
Yes, currently, there do not have relative feature depends on task list is sorted or not, but I'm just thinking that maximum keep the same structure as current version and focus on one hart scheduler case. Maybe the design that tasks held by local hart or global can make experiments to figure out the performance difference later. |
|
Sorry, I somehow missed the notification for the new comment. IMHO, keeping the same structure as the current version isn't that important, especially since you are already making significant changes to the existing code. I think if we only want to focus on the current single-hart scenario for now, then we don't need to specifically separate out the per-hart data structures. Doing so might cause others to become confused when reading the code. If we want to separate out the per-hart data structures to make future SMP support smoother, then we need to plan out as much as possible beforehand what should be global and what should be per-hart local information in a multi-hart scenario. |
|
Hi @visitorckw, Got you, and I'll focus on the single-hart case currently. Also, here are two different sched_select_next_task() for task state transition, and I'm wondering which you prefer. list node based schedulerAs we have discussed before, the TASK_RUNNING and TASK_READY are separated, which were previously kept at But I found that it might be time cost when doing pop and push operation even though both are O(1) time complexity. list node pointer based schedulerI think we can use a rr_cursors[prio] to record which task is executing. And update new tasks in sched_select_next_task() like kcb->task_current = rr_cursors[prio] ->next;
rr_cursors[prio] = kcb->task_current;However, this approach will take more time to handle mo_task_suspend() and mo_task_cancel() because the pointer rr_cursors[prio] needs to be moved to the previous task to ensure that sched_select_next_task() can select proper next task if the current task calls suspend or cancel itself. But the advantage of this will be more stable and takes less time than the list node transition. I'm wondering if you have any suggestions or which one you think much suitable for this project? |
Just out of curiosity, how exactly did you come to the conclusion that push and pop operations are still quite time-consuming, even though they are O(1) operations? My guess is that you either observed the code generated by the compiler or you ran some micro-benchmarks?
I don't have a clear answer in mind, because I'm not sure exactly how time-consuming push and pop are, or how much we specifically care about the performance of the task_suspend and task_cancel operations. Rather than just speculating, perhaps what we really need is to establish a clear benchmark, as @jserv previously requested, and let the data speak for itself? |
Previously, the scheduler performed a linear search through the global task list (kcb->tasks) to find the next TASK_READY task. This approach limited scalability as the search iterations increased with the number of tasks, resulting in higher scheduling latency. To support an O(1) scheduler and improve extensibility, a sched_t structure is introduced and integrated into kcb. The new structure contains: - ready_queues: Holds all runnable tasks, including TASK_RUNNING and TASK_READY. The scheduler selects tasks directly from these queues. - ready_bitmap: Records the state of each ready queue. Using the bitmap, the scheduler can locate the highest-priority runnable task in O(1) time complexity. - rr_cursors: Round-robin cursors that track the next task node in each ready queue. Each priority level maintains its own RR cursor. The top priority cursor is assigned to kcb->task_current, which is advanced circularly after each scheduling cycle. - hart_id: Identifies the scheduler instance per hart (0 for single-hart configurations). - task_idle: The system idle task, executed when no runnable tasks exist. In the current design, kcb binds only one sched_t instance (hart0) for single-hart systems, but this structure can be extended for multi-hart scheduling in the future.
Previously, the list operation for removal was limited to list_remove(), which immediately freed the list node during the function call. When removing a running task (TASK_RUNNING), the list node in the ready queue must not be freed because kcb->task_current shares the same node. This change introduces list_unlink(), which detaches the node from the list without freeing it. The unlinked node is returned to the caller, allowing safe reuse and improving flexibility in dequeue operations. This API will be applied in sched_dequeue_task() for safely removing tasks from ready queues.
When a task is enqueued into or dequeued from the ready queue, the bitmap that indicates the ready queue state should be updated. These three marcos can be used in mo_task_dequeue() and mo_task_enqueue() APIs to improve readability and maintain consistency.
Previously, sched_enqueue_task() only changed task state without inserting into ready queue. As a result, the scheduler could not select enqueued task for execution. This change pushes the task into the appropriate ready queue using list_pusback(), and initializes realated attribution such as the ready bitmap and RR cursor. The ready queue for corresponging task priority will be initialized at this enqueue path and never be released afterward. With this updated API, tasks can be enqueued into the ready queue and selected by cursor-based O(1) scheduler.
Previously, mo_task_dequeue() was only a stub and returned immediately without performing any operation. As a result, tasks remained in the ready queue after being dequeued, leading to potential scheduler inconsistencies. This change implements the full dequeue process: - Searches for the task node in the ready queue by task ID. - Maintains RR cursor consistency: the RR cursor should always point to a valid task node in the ready queue. When removing a task node, the cursor is advanced circularly to the next node. - Unlinks the task node using list_unlink(), which removes the node from the ready queue without freeing it. list_unlink() is used instead of list_remove() to avoid accidentally freeing kcb->task_current when the current running task is dequeued. - Updates and checks queue_counts: if the ready queue becomes empty, the RR cursor is set to NULL and the bitmap is cleared until a new task is enqueued.
Previously, mo_task_spawn() only created a task and appended it to the global task list (kcb->tasks), assigning the first task directly from the global list node. This change adds a call to sched_enqueue_task() within the critical section to enqueue the task into the ready queue and safely initialize its scheduling attributes. The first task assignment is now aligned with the RR cursor mechanism to ensure consistency with the O(1) scheduler.
Previously, the scheduler iterated through the global task list (kcb->tasks) to find the next TASK_READY task, resulting in O(N) selection time. This approach limited scalability and caused inconsistent task rotation under heavy load. The new scheduling process: 1. Check the ready bitmap and find the highest priority level. 2. Select the RR cursor node from the corresponding ready queue. 3. Advance the selected cursor node circularly. Why RR cursor instead of pop/enqueue rotation: - Fewer operations on the ready queue: compared to the pop/enqueue approach, which requires two function calls per switch, the RR cursor method only advances one pointer per scheduling cycle. - Cache friendly: always accesses the same cursor node, improving cache locality on hot paths. - Cycle deterministic: RR cursor design allows deterministic task rotation and enables potential future extensions such as cycle accounting or fairness-based algorithms. This change introduces a fully O(1) scheduler design based on per-priority ready queues and round-robin (RR) cursors. Each ready queue maintains its own cursor, allowing the scheduler to select the next runnable task in constant time.
Previously, mo_task_suspend() only changed the task state to TASK_SUSPENDED without removing the task from the ready queue. As a result, suspended tasks could still be selected by the scheduler, leading to incorrect task switching and inconsistent queue states. This change adds a dequeue operation to remove the corresponding task node from its ready queue before marking it as suspended. Additionally, the condition to detect the currently running task has been updated: the scheduler now compares the TCB pointer (kcb->task_current->data == task) instead of the list node (kcb->task_current == node), since kcb->task_current now stores a ready queue node rather than a global task list node. If the suspended task is currently running, the CPU will yield after the task is suspended to allow the scheduler to select the next runnable task. This ensures that suspended tasks are no longer visible to the scheduler until they are resumed.
Previously, mo_task_cancel() only removed the task node from the global task list (kcb->tasks) but did not remove it from the ready queue. As a result, the scheduler could still select a canceled task that remained in the ready queue. Additionally, freeing the node twice could occur because the same node was already freed after list_remove(), leading to a double-free issue. This change adds a call to sched_dequeue_task() to remove the task from the ready queue, ensuring that once a task is canceled, it will no longer appear in the scheduler’s selection path. This also prevents memory corruption caused by double-freeing list nodes.
Previously, mo_task_resume() only changed resumed task state to TASK_READY, but didn't enqueue it into ready queue. As a result, the scheduler could not select the resumed task for execution. This change adds sched_enqueue_task() to insert the resumed task into the appropriate ready queue and update the ready bitmap, ensuring the resumed task becomes schedulable again.
Previously, mo_task_wakeup() only changed the task state to TASK_READY without enqueuing the task back into the ready queue. As a result, a woken-up task could remain invisible to the scheduler and never be selected for execution. This change adds a call to sched_enqueue_task() to insert the task into the appropriate ready queue based on its priority level. The ready bitmap, task counts of each ready queue, and RR cursor are updated accordingly to maintain scheduler consistency. With this update, tasks transitioned from a blocked or suspended state can be properly scheduled for execution once they are woken up.
This commit introduces a new API, sched_migrate_task(), which enables migration of a task between ready queues of different priority levels. The function safely removes the task from its current ready queue and enqueues it into the target queue, updating the corresponding RR cursor and ready bitmap to maintain scheduler consistency. This helper will be used in mo_task_priority() and other task management routines that adjust task priority dynamically. Future improvement: The current enqueue path allocates a new list node for each task insertion based on its TCB pointer. In the future, this can be optimized by directly transferring or reusing the existing list node between ready queues, eliminating the need for an additional malloc() and free() operations during priority migrations.
This change refactors the priority update process in mo_task_priority() to include early-return checks and proper task migration handling. - Early-return conditions: * Prevent modification of the idle task. * Disallow assigning TASK_PRIO_IDLE to non-idle tasks. The idle task is created by idle_task_init() during system startup and must retain its fixed priority. - Task migration: If the priority-changed task resides in a ready queue (TASK_READY or TASK_RUNNING), sched_migrate_task() is called to move it to the queue corresponding to the new priority. - Running task behavior: When the current running task changes its own priority, it yields the CPU so the scheduler can dispatch the next highest-priority task.
This commit introduces the system idle task and its initialization API (idle_task_init()). The idle task serves as the default execution context when no other runnable tasks exist in the system. The sched_idle() function supports both preemptive and cooperative modes. In sched_t, a list node named task_idle is added to record the idle task sentinel. The idle task never enters any ready queue and its priority level cannot be changed. When idle_task_init() is called, the idle task is initialized as the first execution context. This eliminates the need for additional APIs in main() to set up the initial high-priority task during system launch. This design allows task priorities to be adjusted safely during app_main(), while keeping the scheduler’s entry point consistent.
When all ready queues are empty, the scheduler should switch to idle mode and wait for incoming interrupts. This commit introduces a dedicated helper to handle that transition, centralizing the logic and improving readbility of the scheduler path to idle.
Previously, when all ready queues were empty, the scheduler would trigger a kernel panic. This condition should instead transition into the idle task rather than panic. The new sched_switch_to_idle() helper centralizes this logic, making the path to idle clearer and more readable.
The idle task is now initialized in main() during system startup. This ensures that the scheduler always has a valid execution context before any user or application tasks are created. Initializing the idle task early guarantees a safe fallback path when no runnable tasks exist and keeps the scheduler entry point consistent.
This change sets up the scheduler state during system startup by assigning kcb->task_current to kcb->harts->task_idle and dispatching to the idle task as the first execution context. This commit also keeps the scheduling entry path consistent between startup and runtime.
Previously, both mo_task_spawn() and idle_task_init() implicitly bound their created tasks to kcb->task_current as the first execution context. This behavior caused ambiguity with the scheduler, which is now responsible for determining the active task during system startup. This change removes the initial binding logic from both functions, allowing the startup process (main()) to explicitly assign kcb->task_current (typically to the idle task) during launch. This ensures a single, centralized initialization flow and improves the separation between task creation and scheduling control.

I’ve implemented O(1) task selection using a priority bitmap and per-priority queues. Key design points:
mo_task_spawn()enqueues new tasks, except the very first one whichis assigned to
task_currentwith stateTASK_RUNNING.sched_select_next_task()reinserts the running task into its ready queue, then scansready_bitmapto select the highest-priority ready queue in O(1).Confirm
sched_tdata structure.sched_torkcbwould be preferred to better support future SMP design.