# 3.5 Attach常见的坑
# 不同版本JDK在Attach成功后返回结果差异性
- 现象
当使用JDK11去attach JDK8应用时,会抛异常com.sun.tools.attach.AgentLoadException: 0 , 但实际上已经attach成功了。异常结果如下:
Start arthas failed, exception stack trace: com.sun.tools.attach.AgentLoadException: 0
at jdk.attach/sun.tools.attach.HotSpotVirtualMachine.loadAgentLibrary(HotSpotVirtualMachine.java:108)
at jdk.attach/sun.tools.attach.HotSpotVirtualMachine.loadAgentLibrary(HotSpotVirtualMachine.java:119)
at jdk.attach/sun.tools.attach.HotSpotVirtualMachine.loadAgent(HotSpotVirtualMachine.java:147)
2
3
4
- 原因
在不同的JDK中HotSpotVirtualMachine#loadAgentLibrary方法的返回值不一样 ,在JDK8中返回0表示attach成功。
代码位置:src/share/classes/sun/tools/attach/HotSpotVirtualMachine.java
private void loadAgentLibrary(String agentLibrary, boolean isAbsolute, String options)
throws AgentLoadException, AgentInitializationException, IOException
{
InputStream in = execute("load",
agentLibrary,
isAbsolute ? "true" : "false",
options);
try {
// 返回0表示attach成功
int result = readInt(in);
if (result != 0) {
throw new AgentInitializationException("Agent_OnAttach failed", result);
}
} finally {
in.close();
}
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
JDK11返回的是"return code: 0"表示attach成功。
// 代码位置:src/jdk.attach/share/classes/sun/tools/attach/HotSpotVirtualMachine.java
private void loadAgentLibrary(String agentLibrary, boolean isAbsolute, String options)
throws AgentLoadException, AgentInitializationException, IOException
{
// 返回结果
String msgPrefix = "return code: ";
InputStream in = execute("load",
agentLibrary,
isAbsolute ? "true" : "false",
options);
try (BufferedReader reader = new BufferedReader(new InputStreamReader(in))) {
String result = reader.readLine();
if (result == null) {
throw new AgentLoadException("Target VM did not respond");
} else if (result.startsWith(msgPrefix)) {
int retCode = Integer.parseInt(result.substring(msgPrefix.length()));
// "return code: 0" 表示attach成功
if (retCode != 0) {
throw new AgentInitializationException("Agent_OnAttach failed", retCode);
}
} else {
throw new AgentLoadException(result);
}
}
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
- 方案
发起Attach的进程需要兼容不同版本JDK返回结果。下面是arthas诊断工具对这个问题的兼容性处理方案:
代码位置:arthas/core/src/main/java/com/taobao/arthas/core/Arthas.java
try {
virtualMachine.loadAgent(arthasAgentPath,
configure.getArthasCore() + ";" + configure.toString());
} catch (IOException e) {
// 处理返回值为 "return code: 0"
if (e.getMessage() != null && e.getMessage().contains("Non-numeric value found")) {
AnsiLog.warn(e);
AnsiLog.warn("It seems to use the lower version of JDK to attach the higher version of JDK.");
AnsiLog.warn(
"This error message can be ignored, the attach may have been successful, and it will still try to connect.");
} else {
throw e;
}
} catch (com.sun.tools.attach.AgentLoadException ex) {
// 处理返回值为 "0"
if ("0".equals(ex.getMessage())) {
// https://stackoverflow.com/a/54454418
AnsiLog.warn(ex);
AnsiLog.warn("It seems to use the higher version of JDK to attach the lower version of JDK.");
AnsiLog.warn(
"This error message can be ignored, the attach may have been successful, and it will still try to connect.");
} else {
throw ex;
}
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
上面的代码可以看出,在Attach抛出异常后,对异常进行分类处理,当抛出IOException并且异常的message中有"Non-numeric value found",表示该异常是由于低版本Attach API attach 到高版本JDK上; 当抛出的异常是AgentLoadException并且message的值为"0"时,表示该异常是由于高版本Attach API attach 到低版本JDK导致。
# .java_pid文件被删除
- 现象
当执行attach命令如jstack时,出现报错Unable to open socket file: target process not responding or HotSpot VM not loaded。错误如下所示:
MacBook-Pro admin$ jstack 33000
33000: Unable to open socket file: target process not responding or HotSpot VM not loaded
The -F option can be used when the target process is not responding
2
3
并且/tmp目录下没有attach通讯的.java_pid
MacBook-Pro admin$ ls .java_pid3000
ls: .java_pid3000: No such file or directory
2
然而,重启Java进程之后又可以使用jstack等attach工具了
- 原因
很不幸,这是一个JDK的bug,原因是JVM在首次被attach时会创建.java_pid
方案
对于JDK8来说,只能重启进程;社区的讨论以及官方修复;
官方修复的pr给Attach Listener增加了INITIALIZING、NOT_INITIALIZED、INITIALIZED多种状态,并且在INITIALIZED状态下通过AttachListener::check_socket_file进行自检,如果发现文件不存在,会清理之前的listener,并重新建立。
修复代码如下,在代码行号为17处,对.attach_pid
// Attempt to transit state to AL_INITIALIZING.
AttachListenerState cur_state = AttachListener::transit_state(AL_INITIALIZING, AL_NOT_INITIALIZED);
if (cur_state == AL_INITIALIZING) {
// Attach Listener has been started to initialize. Ignore this signal.
continue;
} else if (cur_state == AL_NOT_INITIALIZED) {
// Start to initialize.
if (AttachListener::is_init_trigger()) {
// Attach Listener has been initialized.
// Accept subsequent request.
continue;
} else {
// Attach Listener could not be started.
// So we need to transit the state to AL_NOT_INITIALIZED.
AttachListener::set_state(AL_NOT_INITIALIZED);
}
} else if (AttachListener::check_socket_file()) {
// .attach_pid<pid>文件进行检测
// Attach Listener has been started, but unix domain socket file
// does not exist. So restart Attach Listener.
continue;
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
需要说明的是,该修复仅限JDK11以上版本。
# attach进程的权限问题
- 现象
如果在root用户下执行jstack,而目标JVM进程不是root权限启动,执行报错如下:
Unable to open socket file: target process not responding or HotSpot VM not loaded
The -F option can be used when the target process is not responding
2
- 原因
下面是在JDK8上LinuxAttachListener线程接受命令的过程。在代码26行处会严格校验发起attach进程的uid和gid是否与目标JVM 一致。
代码位置:jdk8/src/hotspot/os/linux/vm/attachListener_linux.cpp
LinuxAttachOperation* LinuxAttachListener::dequeue() {
for (;;) {
int s;
// wait for client to connect
struct sockaddr addr;
socklen_t len = sizeof(addr);
RESTARTABLE(::accept(listener(), &addr, &len), s);
if (s == -1) {
return NULL; // log a warning?
}
// get the credentials of the peer and check the effective uid/guid
// - check with jeff on this.
struct ucred cred_info;
socklen_t optlen = sizeof(cred_info);
if (::getsockopt(s, SOL_SOCKET, SO_PEERCRED, (void*)&cred_info, &optlen) == -1) {
::close(s);
continue;
}
uid_t euid = geteuid();
gid_t egid = getegid();
// 严格校验 uid、gid
if (cred_info.uid != euid || cred_info.gid != egid) {
::close(s);
continue;
}
// peer credential look okay so we read the request
LinuxAttachOperation* op = read_request(s);
if (op == NULL) {
::close(s);
continue;
} else {
return op;
}
}
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
原则是上root权限不应该受到限制,因此JDK11对这个"不太合理"的限制做了解除,可以使用root权限attach任意用户启动的Java进程。
代码位置:jdk11/src/hotspot/os/linux/attachListener_linux.cpp
LinuxAttachOperation* LinuxAttachListener::dequeue() {
for (;;) {
int s;
// wait for client to connect
struct sockaddr addr;
socklen_t len = sizeof(addr);
RESTARTABLE(::accept(listener(), &addr, &len), s);
if (s == -1) {
return NULL; // log a warning?
}
// get the credentials of the peer and check the effective uid/guid
struct ucred cred_info;
socklen_t optlen = sizeof(cred_info);
if (::getsockopt(s, SOL_SOCKET, SO_PEERCRED, (void*)&cred_info, &optlen) == -1) {
log_debug(attach)("Failed to get socket option SO_PEERCRED");
::close(s);
continue;
}
// 允许root权限attach
if (!os::Posix::matches_effective_uid_and_gid_or_root(cred_info.uid, cred_info.gid)) {
log_debug(attach)("euid/egid check failed (%d/%d vs %d/%d)",
cred_info.uid, cred_info.gid, geteuid(), getegid());
::close(s);
continue;
}
// peer credential look okay so we read the request
LinuxAttachOperation* op = read_request(s);
if (op == NULL) {
::close(s);
continue;
} else {
return op;
}
}
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
matches_effective_uid_and_gid_or_root 的实现如下:
代码位置:jdk11/src/hotspot/os/linux/attachListener_linux.cpp
bool os::Posix::matches_effective_uid_and_gid_or_root(uid_t uid, gid_t gid) {
return is_root(uid) || (geteuid() == uid && getegid() == gid);
}
2
3
- 解决方案
切换到与用户相同权限执行然后再执行Attach。 在介绍jattach工具时已经对这部分代码做了详细分析,这里不在赘述。
# com.sun.tools.attach.AttachNotSupportedException: no providers installed
- 原因以及解决方案 是因为引用的tools.jar包有问题,应该这样引用tools.jar
<dependency>
<groupId>com.sun</groupId>
<artifactId>tools</artifactId>
<version>1.5.0</version>
<scope>system</scope>
<systemPath>/path/to/your/jdk/lib/tools.jar</systemPath>
</dependency>
2
3
4
5
6
7
systemPath标签用来指定本地的tools.jar位置,可以把tools.jar的绝对路径配置成相对路径:
<dependency>
<groupId>com.sun</groupId>
<artifactId>tools</artifactId>
<version>1.5.0</version>
<scope>system</scope>
<systemPath>${env.JAVA_HOME}/lib/tools.jar</systemPath>
</dependency>
2
3
4
5
6
7