Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Linux] Process.cpu_percent is incorrect when instantiated with a tid (thread ID) #1989

Open
benhsmith opened this issue Sep 18, 2021 · 7 comments

Comments

@benhsmith
Copy link

benhsmith commented Sep 18, 2021

Summary

  • OS: { Linux }
  • Architecture: { 64bit, 32bit, ARM, PowerPC, s390 }
  • Psutil version: { 5.8.0 }
  • Python version: { 3.6.8 }
  • Type: { core }

Description

Assume I have a process with pid 100 and 2 threads with tids 101 and 102.

p = Process(101) 
p.cpu_percent(1)

I would expect cpu_percent to return the CPU usage of thread 101 but, instead, it returns thread usage for the entire process (100). Process.threads() does return user and system time, and I can get cpu percent from those but then I have to calculate cpu percent usage in my own code, which is what cpu_percent is for.

Ideally, when the process is a thread, _pslinux._pslinux._parse_stat_file would figure out that its pid is really a tid and then read the proc/PID/task/TID/stat file (which contains per-thread CPU times) instead of /proc/TID/stat.

For some reason tgid, which is what would tell you the PID of the thread, is not in /proc/PID/stat, it's only in /proc/PID/status. So it looks like figuring out that a PID is actually a TID would require also reading /proc/PID/status to get the tgid value.

The downsides to this approach would be reading an additional file (/proc/PID/status) and a slight change to behavior. If anyone was counting on getting CPU percentage for the entire process when they create a Process from a tid, this would break that. However, being able to easily get per thread CPU usage seems much more useful.

I'd be happy to put together a PR but I wanted to check first in case this has already been considered and rejected. I looked in the issues and didn't find anything.

@benhsmith
Copy link
Author

benhsmith commented Sep 20, 2021

I was thinking about this again and it occurred to me another way to provide pre-thread CPU usage would be to just make the objects returned by Process.threads() full Process objects. Then you could do something like:

for thread in Process(101).threads():
    print(f'{thread.name()}: {thread.cpu_percent()}')

that makes a bit more sense since it corresponds to the actual layout of the /proc directories.

@giampaolo
Copy link
Owner

giampaolo commented Oct 4, 2021

Interesting and complicated subject. Some considerations:

  • Linux is the only platform allowing to instantiate a thread via Process class as in thread = Process(tid), and query the thread as if it was a process.
  • As you pointed out, there currently is no distinction in the code whether Process points to a PID or TID, meaning we read thread info from /proc/{pid}/* instead of /proc/{pid}/task/{tid}/* (should we?). Note that this applies to all process info, not only CPU timings.

I would expect cpu_percent to return the CPU usage of thread 101 but, instead, it returns thread usage for the entire process (100).

I confirm:

import psutil, threading, time, os

def worker():
    while 1:
        time.sleep(0.00001)

t = threading.Thread(target=worker)
t.start()
pid = os.getpid()
tid = psutil.Process().threads()[1].id
assert pid != tid

while 1:
    print(psutil.Process(pid).cpu_times())
    print(psutil.Process(tid).cpu_times())
    print()
    time.sleep(1)

...prints:

pcputimes(user=0.06, system=0.01, children_user=0.0, children_system=0.0, iowait=0.0)
pcputimes(user=0.06, system=0.01, children_user=0.0, children_system=0.0, iowait=0.0)

pcputimes(user=0.14, system=0.06, children_user=0.0, children_system=0.0, iowait=0.0)
pcputimes(user=0.14, system=0.06, children_user=0.0, children_system=0.0, iowait=0.0)

pcputimes(user=0.2, system=0.14, children_user=0.0, children_system=0.0, iowait=0.0)
pcputimes(user=0.2, system=0.14, children_user=0.0, children_system=0.0, iowait=0.0)

pcputimes(user=0.29, system=0.2, children_user=0.0, children_system=0.0, iowait=0.0)
pcputimes(user=0.29, system=0.2, children_user=0.0, children_system=0.0, iowait=0.0)

pcputimes(user=0.36, system=0.27, children_user=0.0, children_system=0.0, iowait=0.0)
pcputimes(user=0.36, system=0.27, children_user=0.0, children_system=0.0, iowait=0.0)

@giampaolo
Copy link
Owner

giampaolo commented Oct 4, 2021

The diff below is a patch for Linux which distinguishes between a PID and a TID. When we're dealing with a TID, we read files in /proc/{master-pid}/task/{tid}/* instead of /proc/{master-pid}/*. By {master-pid} I mean the process which spawned the thread (which is different than a conventional parent pid). With the code patched, the CPU times are indeed different:

pcputimes(user=0.07, system=0.0, children_user=0.0, children_system=0.0, iowait=0.0)
pcputimes(user=0.0, system=0.0, children_user=0.0, children_system=0.0, iowait=0.0)

pcputimes(user=0.15, system=0.05, children_user=0.0, children_system=0.0, iowait=0.0)
pcputimes(user=0.06, system=0.07, children_user=0.0, children_system=0.0, iowait=0.0)

pcputimes(user=0.24, system=0.11, children_user=0.0, children_system=0.0, iowait=0.0)
pcputimes(user=0.13, system=0.13, children_user=0.0, children_system=0.0, iowait=0.0)
...

Patch:

diff --git a/psutil/_pslinux.py b/psutil/_pslinux.py
index 3afe6c65..686c209d 100644
--- a/psutil/_pslinux.py
+++ b/psutil/_pslinux.py
@@ -1513,6 +1513,31 @@ def pids():
     return [int(x) for x in os.listdir(b(get_procfs_path())) if x.isdigit()]
 
 
+def pid_for_tid(pid_or_tid):
+    """If the ID refers to a thread, returns the parent/master PID of the
+    process which spawned the thread, else None.
+    """
+    path = "%s/%s/status" % (get_procfs_path(), pid_or_tid)
+    try:
+        f = open_binary(path)
+    except FileNotFoundError:
+        pass
+    else:
+        with f:
+            tgid = pid = None
+            for line in f:
+                if line.startswith(b"Tgid:"):
+                    tgid = int(line.split()[1])
+                elif line.startswith(b"Pid:"):
+                    pid = int(line.split()[1])
+                if pid is not None and tgid is not None:
+                    # If tgid and pid are different then we're dealing with
+                    # a process TID. Despite counter-intuitive, `return tgid`
+                    # here means "return the ID of the process which spawned
+                    # this thread."
+                    return tgid if pid != tgid else None
+
+
 def pid_exists(pid):
     """Check for the existence of a unix PID. Linux TIDs are not
     supported (always return False).
@@ -1591,13 +1616,19 @@ def wrap_exceptions(fun):
 class Process(object):
     """Linux process implementation."""
 
-    __slots__ = ["pid", "_name", "_ppid", "_procfs_path", "_cache"]
+    __slots__ = ["pid", "_name", "_ppid", "_procfs_path", "_cache", "_is_thread"]
 
     def __init__(self, pid):
         self.pid = pid
         self._name = None
         self._ppid = None
-        self._procfs_path = get_procfs_path()
+        master_pid = pid_for_tid(pid)
+        if master_pid:
+            self._is_thread = True
+            self._procfs_path = "%s/%s/task" % (get_procfs_path(), master_pid)
+        else:
+            self._is_thread = False
+            self._procfs_path = get_procfs_path()
 
     def _assert_alive(self):
         """Raise NSP if the process disappeared on us."""
@@ -1953,6 +1984,8 @@ class Process(object):
 
     @wrap_exceptions
     def threads(self):
+        if self._is_thread:
+            return []
         thread_ids = os.listdir("%s/%s/task" % (self._procfs_path, self.pid))
         thread_ids.sort()
         retlist = []

giampaolo added a commit that referenced this issue Oct 4, 2021
@giampaolo
Copy link
Owner

giampaolo commented Oct 4, 2021

Another interesting experiment. It seems that kill()ing a TID will also kill the master/parent process:

import psutil, threading, time, os, signal

def worker():
    while 1:
        print(1)
        time.sleep(1)

t = threading.Thread(target=worker)
t.start()
time.sleep(.1)
tid = psutil.Process().threads()[1].id
p = psutil.Process(tid)
p.kill()  # same with `os.kill(tid, signal.SIGKILL)`
time.sleep(1000)  # we'll never get here

@benhsmith
Copy link
Author

benhsmith commented Oct 5, 2021

Yeah when you send a kill signal to the thread it goes to the thread group. From the kill man page:

       Although it is possible to specify the TID (thread ID, see
       gettid(2)) of one of the threads in a multithreaded process as
       the argument of kill, the signal is nevertheless directed to the
       process (i.e., the entire thread group). In other words, it is
       not possible to send a signal to an explicitly selected thread in
       a multithreaded process. The signal will be delivered to an
       arbitrarily selected thread in the target process that is not
       blocking the signal.

@benhsmith
Copy link
Author

The patch looks like it should fix my problem, thanks. One minor suggestion, instead of master_pid, tid might be more clear since it's the thread ID and tid is common terminology for it in Linux.

@martinakos
Copy link

martinakos commented Nov 30, 2021

Thanks for the patch @giampaolo
I've used your patch to get the per thread cpu utilization and plot it with a modified psrecord. Below I show the psrecord output showing the total cpu usage of my process and my modification to show the corresponding per thread cpu usage.
image
image
One thing I notice is that the per thread cpu usage has less quantization levels than the total cpu usage and I wonder if you may know why is that the case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants