1
0
mirror of https://git.yoctoproject.org/poky synced 2026-05-08 05:09:24 +00:00

sphinx: profile-manual: add figures

(From yocto-docs rev: bfa1c146265d165ddfa1a70488d0f043ec9df2ee)

Signed-off-by: Nicolas Dechesne <nicolas.dechesne@linaro.org>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
This commit is contained in:
Nicolas Dechesne
2020-07-29 10:58:36 +02:00
committed by Richard Purdie
parent ffa58f2985
commit 6e4788bb67
@@ -160,6 +160,9 @@ and wrote 0.176 MB perf.data (~7700 samples) ] To see the results in a
perf.data file in the current working directory and display the results
in an interactive UI: root@crownbay:~# perf report
.. image:: figures/perf-wget-flat-stripped.png
:align: center
The above screenshot displays a 'flat' profile, one entry for each
'bucket' corresponding to the functions that were profiled during the
profiling run, ordered from the most popular to the least (perf has
@@ -189,6 +192,9 @@ linux-2.6.19.2.tar.b 100%
and wrote 0.652 MB perf.data (~28476 samples) ] root@crownbay:~# perf
report
.. image:: figures/perf-wget-g-copy-to-user-expanded-stripped.png
:align: center
Using the callgraph view, we can actually see not only which functions
took the most time, but we can also see a summary of how those functions
were called and learn something about how the program interacts with the
@@ -222,6 +228,9 @@ displayed in the callstack, here in the expanded sys_clock_gettime()
function. Later we'll see it resolve to a userspace function call in
busybox.
.. image:: figures/perf-wget-g-copy-from-user-expanded-stripped.png
:align: center
The above screenshot shows the other half of the journey for the data -
from the wget program's userspace buffers to disk. To get the buffers to
disk, the wget program issues a write(2), which does a copy-from-user to
@@ -236,6 +245,9 @@ actually implemented as an applet in busybox, so while the process name
is 'wget', the executable we're actually interested in is busybox. So
let's expand the first entry containing busybox:
.. image:: figures/perf-wget-busybox-expanded-stripped.png
:align: center
Again, before we expanded we saw that the function was labeled with a
hex value instead of a symbol as with most of the kernel entries.
Expanding the busybox entry doesn't make it any better.
@@ -270,14 +282,23 @@ busybox-dbg-1.20.2-r2.core2_32.rpm Now that the debuginfo is installed,
we see that the busybox entries now display their functions
symbolically:
.. image:: figures/perf-wget-busybox-debuginfo.png
:align: center
If we expand one of the entries and press 'enter' on a leaf node, we're
presented with a menu of actions we can take to get more information
related to that entry:
.. image:: figures/perf-wget-busybox-dso-zoom-menu.png
:align: center
One of these actions allows us to show a view that displays a
busybox-centric view of the profiled functions (in this case we've also
expanded all the nodes using the 'E' key):
.. image:: figures/perf-wget-busybox-dso-zoom.png
:align: center
Finally, we can see that now that the busybox debuginfo is installed,
the previously unresolved symbol in the sys_clock_gettime() entry
mentioned previously is now resolved, and shows that the
@@ -285,16 +306,25 @@ sys_clock_gettime system call that was the source of 6.75% of the
copy-to-user overhead was initiated by the handle_input() busybox
function:
.. image:: figures/perf-wget-g-copy-to-user-expanded-debuginfo.png
:align: center
At the lowest level of detail, we can dive down to the assembly level
and see which instructions caused the most overhead in a function.
Pressing 'enter' on the 'udhcpc_main' function, we're again presented
with a menu:
.. image:: figures/perf-wget-busybox-annotate-menu.png
:align: center
Selecting 'Annotate udhcpc_main', we get a detailed listing of
percentages by instruction for the udhcpc_main function. From the
display, we can see that over 50% of the time spent in this function is
taken up by a couple tests and the move of a constant (1) to a register:
.. image:: figures/perf-wget-busybox-annotate-udhcpc.png
:align: center
As a segue into tracing, let's try another profile using a different
counter, something other than the default 'cycles'.
@@ -424,6 +454,9 @@ and tell perf to do a profile using it as the sampling event:
root@crownbay:~# perf record -g -e sched:sched_wakeup wget
http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2
.. image:: figures/sched-wakeup-profile.png
:align: center
The screenshot above shows the results of running a profile using
sched:sched_switch tracepoint, which shows the relative costs of various
paths to sched_wakeup (note that sched_wakeup is the name of the
@@ -653,6 +686,9 @@ linux-2.6.19.2.tar.b 100% \|*******************************\| 41727k
0:00:00 ETA Here we see entries not only for our wget load, but for
other processes running on the system as well:
.. image:: figures/perf-systemwide.png
:align: center
In the snapshot above, we can see callchains that originate in libc, and
a callchain from Xorg that demonstrates that we're using a proprietary X
driver in userspace (notice the presence of 'PVR' and some other
@@ -665,6 +701,9 @@ record a profile: root@crownbay:~# perf record -g -a -e cycles:u ^C[
perf record: Woken up 2 times to write data ] [ perf record: Captured
and wrote 0.376 MB perf.data (~16443 samples) ]
.. image:: figures/perf-report-cycles-u.png
:align: center
Notice in the screenshot above, we see only userspace entries ([.])
Finally, we can press 'enter' on a leaf node and select the 'Zoom into
@@ -672,6 +711,9 @@ DSO' menu item to show only entries associated with a specific DSO. In
the screenshot below, we've zoomed into the 'libc' DSO which shows all
the entries associated with the libc-xxx.so DSO.
.. image:: figures/perf-systemwide-libc.png
:align: center
We can also use the system-wide -a switch to do system-wide tracing.
Here we'll trace a couple of scheduler events: root@crownbay:~# perf
record -a -e sched:sched_switch -e sched:sched_wakeup ^C[ perf record:
@@ -838,6 +880,9 @@ gthumb 1300 [001] 34219.264551: do_fork: (c1028460) pcmanfm 1296 [000]
(c1028460) And using 'perf report' on the same file, we can see the
callgraphs from starting a few programs during those 30 seconds:
.. image:: figures/perf-probe-do_fork-profile.png
:align: center
.. container:: informalexample
Tying it Together:
@@ -1312,6 +1357,9 @@ choosing from the kernelshark menu: Capture \| Record That will display
the following dialog, which allows you to choose one or more events (or
even one or more complete subsystems) to trace:
.. image:: figures/kernelshark-choose-events.png
:align: center
Note that these are exactly the same sets of events described in the
previous trace events subsystem section, and in fact is where trace-cmd
gets them for kernelshark.
@@ -1324,6 +1372,9 @@ After doing that, we can start and stop the trace using the 'Run' and
'Stop' button on the lower right corner of the dialog (the same button
will turn into the 'Stop' button after the trace has started):
.. image:: figures/kernelshark-output-display.png
:align: center
Notice that the right-hand pane shows the exact trace-cmd command-line
that's used to run the trace, along with the results of the trace-cmd
run.
@@ -1332,9 +1383,15 @@ Once the 'Stop' button is pressed, the graphical view magically fills up
with a colorful per-cpu display of the trace data, along with the
detailed event listing below that:
.. image:: figures/kernelshark-i915-display.png
:align: center
Here's another example, this time a display resulting from tracing 'all
events':
.. image:: figures/kernelshark-all.png
:align: center
The tool is pretty self-explanatory, but for more detailed information
on navigating through the data, see the `kernelshark
website <http://rostedt.homelinux.com/kernelshark/>`__.
@@ -1543,6 +1600,9 @@ press the 'Profile' button.
Once you've pressed the profile button, the three panes will fill up
with profiling data:
.. image:: figures/sysprof-copy-to-user.png
:align: center
The left pane shows a list of functions and processes. Selecting one of
those expands that function in the right pane, showing all its callees.
Note that this caller-oriented display is essentially the inverse of
@@ -1554,6 +1614,9 @@ looking up the callchain we can see that one of the callers of
Notice that this is essentially a portion of the same information we saw
in the perf display shown in the perf section of this page.
.. image:: figures/sysprof-copy-from-user.png
:align: center
Similarly, the above is a snapshot of the Sysprof display of a
copy-from-user callchain.
@@ -1562,6 +1625,9 @@ a list of all the callers of a particular function selected in the top
left pane. In this case, the lower pane is showing all the callers of
\__mark_inode_dirty:
.. image:: figures/sysprof-callers.png
:align: center
Double-clicking on one of those functions will in turn change the focus
to the selected function, and so on.