某天, 在erlang当中启动了大概几万个进程, 每个进程各自做类似于下列操作的时候
1 2 3 4 5 6 7 8 9 10 | F = fun() -> case mnesia:read(record, Id) of [Record] -> NRecord = process_record(Record), mnesia:write(record, NRecord, sticky_write); [] -> mnesia:abort(no_rec) end end, mnesia:transaction(F). |
mnesia卡死了, 调查了一下, 发现存在大量的全表读锁
看了一下mnesia源代码, 找出了祸首mnesia_locker.erl:
734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 | do_sticky_lock(Tid, Store, {Tab, Key} = Oid, Lock) -> {WNodes, Majority} = w_nodes(Tab), sticky_check_majority(Lock, Tab, Majority, WNodes), ?MODULE ! {self(), {test_set_sticky, Tid, Oid, Lock}}, N = node(), receive {?MODULE, N, granted} -> ?ets_insert(Store, {{locks, Tab, Key}, write}), [?ets_insert(Store, {nodes, Node}) || Node <- WNodes], granted; {?MODULE, N, {granted, Val}} -> %% for rwlocks case opt_lookup_in_client(Val, Oid, write) of C = #cyclic{} -> exit({aborted, C}); Val2 -> ?ets_insert(Store, {{locks, Tab, Key}, write}), [?ets_insert(Store, {nodes, Node}) || Node <- WNodes], Val2 end; {?MODULE, N, {not_granted, Reason}} -> exit({aborted, Reason}); {?MODULE, N, not_stuck} -> not_stuck(Tid, Store, Tab, Key, Oid, Lock, N), dirty_sticky_lock(Tab, Key, [N], Lock); {mnesia_down, Node} -> EMsg = {aborted, {node_not_running, Node}}, flush_remaining([N], Node, EMsg); {?MODULE, N, {stuck_elsewhere, _N2}} -> stuck_elsewhere(Tid, Store, Tab, Key, Oid, Lock), dirty_sticky_lock(Tab, Key, [N], Lock) end. |
776 777 778 779 780 781 | not_stuck(Tid, Store, Tab, _Key, Oid, _Lock, N) -> rlock(Tid, Store, {Tab, ?ALL}), %% needed? wlock(Tid, Store, Oid), %% perfect sync wlock(Tid, Store, {Tab, ?STICK}), %% max one sticker/table Ns = val({Tab, where_to_write}), rpc:abcast(Ns, ?MODULE, {stick, Oid, N}). |
获取sticky_write锁的时候, 先向mnesia_locker进程发送一条test_set_sticky消息,然后等待返回
如果返回结果为granted, 说明粘滞锁已经存在, 继续后续操作
如果返回结果为not_stuck, 那么需要去获取粘滞锁, 这个过程是:
先全表读锁, 然后单记录写锁,然后表粘滞锁, 然后通知其他节点
如果这个过程是并发的,并且事先表上没有粘滞锁, 那么到生成粘滞锁之前的所有test_set_sticky操作都会返回not_stuck, 这就是大量全表读锁的来源
不过,777行这个needed?是怎么回事?
不清楚是不是需要就直接来一个全表大奖? 你当你是老虎机啊!
当然, 解决方法也很简单, 做并发操作之前让这个粘滞锁事先存在就可以了,
1 2 | F = fun() -> mnesia:lock({table, record}, sticky_write) end, mnesia:transaction(F). |
mnesia版本4.4.19, erlang版本R14A
Recent Comments