Toggle navigation
首页
问答
文章
积分商城
专家
专区
更多专区...
文档中心
返回主站
搜索
提问
会员
中心
登录
注册
RT-Thread一般讨论
[ZT]The lightest lightweight threads, Protothreads
发布于 2008-06-24 23:23:10 浏览:4511
订阅该版
The lightest lightweight threads, Protothreads Last week, I used the Lwt cooperative lightweight thread library to implement a benchmark that measures context switch performance, determined that it was GC bound and timed it against comparable programs (i.e., the fastest implementations in the computer language benchmark games, which are all based on lightweight threads) and a C version that uses POSIX threads, obtaining these results: implementation memory usage time (s) Haskell GHC 6.8.2 2680KB 1.22 OCaml ocamlopt 1024Kword minor heap 5178KB 1.85 Haskell GHC 6.8.2 -threaded, -N1 2760KB 1.9 Erlang R12B-1 HiPE 5996KB 3.96 Mozart/Oz 3788KB 4.10 OCaml ocamlopt 32Kword minor heap 970KB 4.24 Haskell GHC 6.8.2 -threaded, -N2 3300KB 15.27 GCC C (POSIX threads) 4520KB 28.7 Here are the figures I get for the C version I made with Protothreads: Addendum This post wouldn't exist hadn't rektide asked for these results. GCC C (Protothreads, optimum scheduling) 220KB 0.076s GCC C (Protothreads, pessimum scheduling) 220KB 18.6s (Read below for an explanation of the difference between optimum and pessimum scheduling policies.) It is nearly 400 times faster than the C version with POSIX threads, and represents a one order of magnitude improvement over the other lightweight thread implementations. It also needs less memory. The performance is almost unbelievable. Where's the catch, ugly code maybe? Judge for yourself; here are the Protothreads and the POSIX threads C implementations head to head*1: ```/* Protothreads, 0.076s*/ #include
#include
#include "pt.h" #include "pt-sem.h" #define NUM_THREADS 503 static struct pt thread_context[NUM_THREADS]; static int data[NUM_THREADS]; static struct pt_sem mutex[NUM_THREADS]; static PT_THREAD(thread(struct pt *pt, int id)) { static int token; static int r; PT_BEGIN(pt); while(1) { PT_SEM_WAIT(pt, &mutex[id]); token = data[id]; r = (id + 1) % NUM_THREADS; if(token) { data[r] = token - 1; PT_SEM_SIGNAL(pt, &mutex[r]); } else { printf("%d ", id + 1); exit(0); } } PT_END(pt); } int main(int argc, char *argv[]) { int i; if(argc != 2) exit(255); data[0] = atoi(argv[1]); for(i = 0; i < NUM_THREADS; i++) { PT_SEM_INIT(&mutex*, 0); PT_INIT(&thread_context*); } PT_SEM_INIT(&mutex[0], 1); while(1) { for(i = 0; i < NUM_THREADS; i++) thread(&thread_context*, i); } } /* POSIX threads, 28.7s */ #include
#include
#include
#include
#define NUM_THREADS 503 struct stack { char x[PTHREAD_STACK_MIN]; }; static pthread_mutex_t mutex[THREADS]; static int data[THREADS]; static struct stack stacks[THREADS]; static void* thread(void *num) { int l = (int)num; int r = (l+1) % THREADS; int token; while(1) { pthread_mutex_lock(mutex + l); token = data[l]; if (token) { data[r] = token - 1; pthread_mutex_unlock(mutex + r); } else { printf("%i ", l+1); exit(0); } } } int main(int argc, char **argv) { int i; pthread_t cthread; pthread_attr_t stack_attr; if (argc != 2) exit(255); data[0] = atoi(argv[1]); pthread_attr_init(&stack_attr); for (i = 0; i < THREADS; i++) { pthread_mutex_init(mutex + i, NULL); pthread_mutex_lock(mutex + i); pthread_attr_setstack(&stack_attr, &stacks*, sizeof(struct stack)); pthread_create(&cthread, &stack_attr, thread, (void*)i); } pthread_mutex_unlock(mutex + 0); pthread_join(cthread, NULL); } ``` Even though the Protothreads code is similar to the one with pthreads, there are two important differences: there is no thread-local data in the Protothreads version. r is recomputed on each iteration the proto threads are scheduled manually by running all the corresponding functions in sequence These dissimilarities give some insight into the nature of Protothreads. They are little more than small state machines whose state is stored in an opaque pt structure. Protothreads' implementation consists of a few preprocessor macros in headers, so the best way to see how they work is to examine the output of the preprocessor (gcc -E, slightly abridged and reformatted here): ``` typedef unsigned short lc_t; struct pt { lc_t lc; }; struct pt_sem { unsigned int count; }; static struct pt thread_context[503]; static int data[503]; static struct pt_sem mutex[503]; static char thread(struct pt *pt, int id) { static int token; static int r; { char PT_YIELD_FLAG = 1; switch((pt)->lc) { case 0: while(1) { do { do { (pt)->lc = 20; case 20:; if(!((&mutex[id])->count > 0)) { return 0; } } while(0); --(&mutex[id])->count; } while(0); token = data[id]; r = (id + 1) % 503; if(token) { data[r] = token - 1; ++(&mutex[r])->count; } else { printf("%d ", id + 1); exit(0); } } }; PT_YIELD_FLAG = 0; (pt)->lc = 0;; return 3; }; } int main(int argc, char *argv[]) { int i; if(argc != 2) exit(255); data[0] = atoi(argv[1]); for(i = 0; i < 503; i++) { (&mutex*)->count = 0; (&thread_context*)->lc = 0;; } (&mutex[0])->count = 1; while(1) { for(i = 0; i < 503; i++) thread(&thread_context*, i); } } ``` Protothreads use a little-known trick: switch statements do not have to be structured, in the sense that they can jump directly into the body of internal statements. For instance, you can place a case label in the body of an if statement. Essentially, switch is being used to perform arbitrary gotos determined by the value of the pt thread state. This sounds very much like computed gotos, and, indeed, there's another implementation of Protothreads's LC (as in "Local Continuations") core based on that GCC extension. (I have to confess that I have never used switch in such an "unstructured" way; I've employed computed gotos, though.) Context switch points are marked using macros like PT_SEM_WAIT, which expand into code that: sets the thread state to a value representing the current instruction pointer adds a label for that value The generated label allows the proto thread to jump to the saved IP. The thread state is encoded in an unsigned short int and takes only 2 bytes; there is no thread-local state by default, since local variables aren't restored across runs. Scheduling Threads are scheduled by calling the thread function with the appropriate thread state. Proto-threads blocked in PT_SEM_WAIT will quickly return without doing anything. The reason why Protothreads is so good at the threadring benchmark is that threads are being scheduled in an optimal way with no overhead. In threadring, the Nth thread unblocks the (N+1)th one, which matches the order in which the threads are executed in the above code. This means that the PT_SEM_WAIT will never actually "block" (i.e., return without doing anything) and the program progresses steadily. Things looks very different if proto-threads are scheduled in reverse order, by doing ``` while(1) { for(i = NUM_THREADS - 1 ; i >= 0; i--) thread(&thread_context*, i); } ``` This is the worst-case scenario for threaring, as 501 threads in the ring will be visited before the only runnable one is found: GCC C (Protothreads) 220KB 18.6s The pessimum scenario is 250 slower than the optimal one. We don't get a ratio of 500 here because the benchmark involves some minimal computation in addition to context switching. This overhead is only observable in the optimum Protothreads version (remember that the fastest general-purpose lightweight thread implementation, GHC's, took no less than 1.2s to perform 10 million context switches). Generalization Protothreads are exceptionally fast in the threadring benchmark because it does little more than context switching, and the optimum scheduling policy is found trivially. In practice, when you use (lightweight) threads, you will more often than not want at least: thread-local state automatic scheduling These features can be built on top of Protothreads's core, but they will make the resulting code both slower and more memory-hungry than the optimum case shown above. Scheduling, in particular, will require at least that a queue of runnable threads be maintained. This will be fast (a straightforward implementation will be logarithmic in the number of threads), but still considerably slower than the above code. The author of Protothreads recommends them for applications like "memory constrained systems, event-driven protocol stacks, small embedded systems, (or) sensor network nodes" and, indeed, nothing comes close in terms of lightness for these kinds of tasks.
查看更多
1
个回答
默认排序
按发布时间排序
撰写答案
登录
注册新账号
关注者
0
被浏览
4.5k
关于作者
shaolin
这家伙很懒,什么也没写!
提问
115
回答
444
被采纳
0
关注TA
发私信
相关问题
1
有关动态模块加载的一篇论文
2
最近的调程序总结
3
晕掉了,这么久都不见layer2的踪影啊
4
继续K9ii的历程
5
[GUI相关] FreeType 2
6
[GUI相关]嵌入式系统中文输入法的设计
7
20081101 RT-Thread开发者聚会总结
8
嵌入式系统基础
9
linux2.4.19在at91rm9200 上的寄存器设置
10
[转]基于嵌入式Linux的通用触摸屏校准程序
推荐文章
1
RT-Thread应用项目汇总
2
玩转RT-Thread系列教程
3
国产MCU移植系列教程汇总,欢迎查看!
4
机器人操作系统 (ROS2) 和 RT-Thread 通信
5
五分钟玩转RT-Thread新社区
6
【技术三千问】之《玩转ART-Pi》,看这篇就够了!干货汇总
7
关于STM32H7开发板上使用SDIO接口驱动SD卡挂载文件系统的问题总结
8
STM32的“GPU”——DMA2D实例详解
9
RT-Thread隐藏的宝藏之completion
10
【ART-PI】RT-Thread 开启RTC 与 Alarm组件
最新文章
1
进行i2c驱动移植的经验总结
2
在VSCode中使用clang-format
3
我该如何使用这个微雪的WIFI400 WIFI-LPB-100在rtt里或者我该怎样为它开发驱动
4
在GD32F470Z的RTC适配笔记
5
RT Thread 源码笔记 :内存池
热门标签
RT-Thread Studio
串口
Env
LWIP
SPI
AT
Bootloader
Hardfault
CAN总线
FinSH
ART-Pi
USB
DMA
文件系统
RT-Thread
SCons
RT-Thread Nano
线程
MQTT
STM32
RTC
FAL
rt-smart
ESP8266
I2C_IIC
UART
WIZnet_W5500
ota在线升级
freemodbus
PWM
flash
cubemx
packages_软件包
BSP
潘多拉开发板_Pandora
定时器
ADC
GD32
flashDB
socket
中断
编译报错
Debug
rt_mq_消息队列_msg_queue
SFUD
msh
keil_MDK
ulog
C++_cpp
MicroPython
本月问答贡献
出出啊
1517
个答案
342
次被采纳
小小李sunny
1444
个答案
289
次被采纳
张世争
808
个答案
174
次被采纳
crystal266
547
个答案
161
次被采纳
whj467467222
1222
个答案
148
次被采纳
本月文章贡献
catcatbing
3
篇文章
4
次点赞
qq1078249029
2
篇文章
2
次点赞
xnosky
2
篇文章
1
次点赞
Woshizhapuren
1
篇文章
5
次点赞
YZRD
1
篇文章
2
次点赞
回到
顶部
发布
问题
分享
好友
手机
浏览
扫码手机浏览
投诉
建议
回到
底部