FastThreadLocal解析

前言

之前《TreadLocal解析》说过Threadlocal的结构:

ThreadLocal结构

但netty却重新搞了一个fastthreadlocal,从各方面对比一下两者的区别。也不得不说一下netty真不愧是款优秀框架,里面中有很多优秀类和方法值得细品

VS ThreadLocal

1、性能

第一点,从性能开始,为什么要重造轮子,可能就是之前的轮子达不到性能要求

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
public class FastThreadLocalTest {


public static void main(String[] args) {
testFast(100);
testSlow(100);
}

private static void testFast(int threadLocalCount) {
final FastThreadLocal<String>[] caches = new FastThreadLocal[threadLocalCount];
final Thread mainThread = Thread.currentThread();
for (int i = 0; i < threadLocalCount; i++) {
caches[i] = new FastThreadLocal();
}
Thread t = new FastThreadLocalThread(new Runnable() {
@Override
public void run() {
for (int i = 0; i < threadLocalCount; i++) {
caches[i].set("float.lu");
}
long start = System.nanoTime();
for (int i = 0; i < threadLocalCount; i++) {
for (int j = 0; j < 1000000; j++) {
caches[i].get();
}
}
long end = System.nanoTime();
System.out.println("take[" + TimeUnit.NANOSECONDS.toMillis(end - start) +
"]ms");
LockSupport.unpark(mainThread);
}

});
t.start();
LockSupport.park(mainThread);
}

private static void testSlow(int threadLocalCount) {
final ThreadLocal<String>[] caches = new ThreadLocal[threadLocalCount];
final Thread mainThread = Thread.currentThread();
for (int i=0;i<threadLocalCount;i++) {
caches[i] = new ThreadLocal();
}
Thread t = new Thread(new Runnable() {
@Override
public void run() {
for (int i=0;i<threadLocalCount;i++) {
caches[i].set("float.lu");
}
long start = System.nanoTime();
for (int i=0;i<threadLocalCount;i++) {
for (int j=0;j<1000000;j++) {
caches[i].get();
}
}
long end = System.nanoTime();
System.out.println("take[" + TimeUnit.NANOSECONDS.toMillis(end - start) +
"]ms");
LockSupport.unpark(mainThread);
}

});
t.start();
LockSupport.park(mainThread);
}
}

//输出
fast[15]ms
slow[302]ms

从输出可见性能提升很大

2、数据结构

两者的数据结构大体相似,都是thread带上map属性,threadlocal实例为key;但在细节算法处理时,不一样

get()

整体思路:通过thread取到map,再从map中取value

ThreadLocal.get()

1
2
3
4
5
6
7
8
9
10
11
12
13
public T get() {
Thread t = Thread.currentThread();
ThreadLocalMap map = getMap(t);
if (map != null) {
ThreadLocalMap.Entry e = map.getEntry(this);
if (e != null) {
@SuppressWarnings("unchecked")
T result = (T)e.value;
return result;
}
}
return setInitialValue();
}

从map中取值:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
private Entry getEntry(ThreadLocal<?> key) {
int i = key.threadLocalHashCode & (table.length - 1);
Entry e = table[i];
if (e != null && e.get() == key)
return e;
else
return getEntryAfterMiss(key, i, e);
}

private Entry getEntryAfterMiss(ThreadLocal<?> key, int i, Entry e) {
Entry[] tab = table;
int len = tab.length;

while (e != null) {
ThreadLocal<?> k = e.get();
if (k == key)
return e;
if (k == null)
expungeStaleEntry(i);
else
i = nextIndex(i, len);
e = tab[i];
}
return null;
}

如果key值相等,直接返回value

如果key不相等,使用循环线性探测,一直找到最后一个元素

FastThreadLocal.get()

1
2
3
4
5
6
7
8
9
10
11
12
13
public final V get(InternalThreadLocalMap threadLocalMap) {
Object v = threadLocalMap.indexedVariable(index);
if (v != InternalThreadLocalMap.UNSET) {
return (V) v;
}

return initialize(threadLocalMap);
}

public Object indexedVariable(int index) {
Object[] lookup = indexedVariables;
return index < lookup.length? lookup[index] : UNSET;
}

这个明显就快些,有index,直接数组拿值,不需要再去处理循环

set()

主要在于向map中放值

ThreadLocal.set()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
public void set(T value) {
Thread t = Thread.currentThread();
ThreadLocalMap map = getMap(t);
if (map != null)
map.set(this, value);
else
createMap(t, value);
}

private void set(ThreadLocal<?> key, Object value) {

// We don't use a fast path as with get() because it is at
// least as common to use set() to create new entries as
// it is to replace existing ones, in which case, a fast
// path would fail more often than not.

Entry[] tab = table;
int len = tab.length;
int i = key.threadLocalHashCode & (len-1);

for (Entry e = tab[i];
e != null;
e = tab[i = nextIndex(i, len)]) {
ThreadLocal<?> k = e.get();

if (k == key) {
e.value = value;
return;
}

if (k == null) {
replaceStaleEntry(key, value, i);
return;
}
}

tab[i] = new Entry(key, value);
int sz = ++size;
if (!cleanSomeSlots(i, sz) && sz >= threshold)
rehash();
}
  1. 通过取模,得到index
  2. key相等,直接赋值value
  3. key不相等,那就线性探测存放

FastThreadLocal.set()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
public final void set(V value) {
if (value != InternalThreadLocalMap.UNSET) {
InternalThreadLocalMap threadLocalMap = InternalThreadLocalMap.get();
if (setKnownNotUnset(threadLocalMap, value)) {
registerCleaner(threadLocalMap);
}
} else {
remove();
}
}

public boolean setIndexedVariable(int index, Object value) {
Object[] lookup = indexedVariables;
if (index < lookup.length) {
Object oldValue = lookup[index];
lookup[index] = value;
return oldValue == UNSET;
} else {
expandIndexedVariableTableAndSet(index, value);
return true;
}
}

这类似就是放入到数组中

总结

到此可以看出二者的区别

区别 ThreadLocal FastThreadLocal
map ThreadLocalMap InternalThreadLocalMap extends UnpaddedInternalThreadLocalMap
线程 Thread FastThreadLocalThread extends Thread

主要还是在内部map的处理逻辑上,两者都没有使用hashmap,但是自定义了map结构与行为,在《hashmap源码解析》中指出map结构的两种处理方式:拉链法线性探测法;在hasmap中使用的是拉链法,而threadlocal中使用的是线性探测法

线性探查(Linear Probing)方式虽然简单,但是有一些问题,它会导致同类哈希的聚集。在存入的时候存在冲突,在查找的时候冲突依然存在

冲突也就造成了性能损耗,而FastTreadLocal就更简单,直接使用数组

1
2
3
public FastThreadLocal() {
index = InternalThreadLocalMap.nextVariableIndex();
}

UnpaddedInternalThreadLocalMap

1
2
3
4
5
6
7
8
9
10
11
Object[] indexedVariables;


public static int nextVariableIndex() {
int index = nextIndex.getAndIncrement();
if (index < 0) {
nextIndex.decrementAndGet();
throw new IllegalStateException("too many thread-local indexed variables");
}
return index;
}

整个map就是一个数组结构,在每个thread中,每一个FastThreadLocal在创建时就指定了index,value就是数组元素

公众号:码农戏码
欢迎关注微信公众号『码农戏码』