mirror of
https://git.yoctoproject.org/poky
synced 2026-05-08 05:09:24 +00:00
sphinx: profile-manual: add figures
(From yocto-docs rev: bfa1c146265d165ddfa1a70488d0f043ec9df2ee) Signed-off-by: Nicolas Dechesne <nicolas.dechesne@linaro.org> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
This commit is contained in:
committed by
Richard Purdie
parent
ffa58f2985
commit
6e4788bb67
@@ -160,6 +160,9 @@ and wrote 0.176 MB perf.data (~7700 samples) ] To see the results in a
|
||||
perf.data file in the current working directory and display the results
|
||||
in an interactive UI: root@crownbay:~# perf report
|
||||
|
||||
.. image:: figures/perf-wget-flat-stripped.png
|
||||
:align: center
|
||||
|
||||
The above screenshot displays a 'flat' profile, one entry for each
|
||||
'bucket' corresponding to the functions that were profiled during the
|
||||
profiling run, ordered from the most popular to the least (perf has
|
||||
@@ -189,6 +192,9 @@ linux-2.6.19.2.tar.b 100%
|
||||
and wrote 0.652 MB perf.data (~28476 samples) ] root@crownbay:~# perf
|
||||
report
|
||||
|
||||
.. image:: figures/perf-wget-g-copy-to-user-expanded-stripped.png
|
||||
:align: center
|
||||
|
||||
Using the callgraph view, we can actually see not only which functions
|
||||
took the most time, but we can also see a summary of how those functions
|
||||
were called and learn something about how the program interacts with the
|
||||
@@ -222,6 +228,9 @@ displayed in the callstack, here in the expanded sys_clock_gettime()
|
||||
function. Later we'll see it resolve to a userspace function call in
|
||||
busybox.
|
||||
|
||||
.. image:: figures/perf-wget-g-copy-from-user-expanded-stripped.png
|
||||
:align: center
|
||||
|
||||
The above screenshot shows the other half of the journey for the data -
|
||||
from the wget program's userspace buffers to disk. To get the buffers to
|
||||
disk, the wget program issues a write(2), which does a copy-from-user to
|
||||
@@ -236,6 +245,9 @@ actually implemented as an applet in busybox, so while the process name
|
||||
is 'wget', the executable we're actually interested in is busybox. So
|
||||
let's expand the first entry containing busybox:
|
||||
|
||||
.. image:: figures/perf-wget-busybox-expanded-stripped.png
|
||||
:align: center
|
||||
|
||||
Again, before we expanded we saw that the function was labeled with a
|
||||
hex value instead of a symbol as with most of the kernel entries.
|
||||
Expanding the busybox entry doesn't make it any better.
|
||||
@@ -270,14 +282,23 @@ busybox-dbg-1.20.2-r2.core2_32.rpm Now that the debuginfo is installed,
|
||||
we see that the busybox entries now display their functions
|
||||
symbolically:
|
||||
|
||||
.. image:: figures/perf-wget-busybox-debuginfo.png
|
||||
:align: center
|
||||
|
||||
If we expand one of the entries and press 'enter' on a leaf node, we're
|
||||
presented with a menu of actions we can take to get more information
|
||||
related to that entry:
|
||||
|
||||
.. image:: figures/perf-wget-busybox-dso-zoom-menu.png
|
||||
:align: center
|
||||
|
||||
One of these actions allows us to show a view that displays a
|
||||
busybox-centric view of the profiled functions (in this case we've also
|
||||
expanded all the nodes using the 'E' key):
|
||||
|
||||
.. image:: figures/perf-wget-busybox-dso-zoom.png
|
||||
:align: center
|
||||
|
||||
Finally, we can see that now that the busybox debuginfo is installed,
|
||||
the previously unresolved symbol in the sys_clock_gettime() entry
|
||||
mentioned previously is now resolved, and shows that the
|
||||
@@ -285,16 +306,25 @@ sys_clock_gettime system call that was the source of 6.75% of the
|
||||
copy-to-user overhead was initiated by the handle_input() busybox
|
||||
function:
|
||||
|
||||
.. image:: figures/perf-wget-g-copy-to-user-expanded-debuginfo.png
|
||||
:align: center
|
||||
|
||||
At the lowest level of detail, we can dive down to the assembly level
|
||||
and see which instructions caused the most overhead in a function.
|
||||
Pressing 'enter' on the 'udhcpc_main' function, we're again presented
|
||||
with a menu:
|
||||
|
||||
.. image:: figures/perf-wget-busybox-annotate-menu.png
|
||||
:align: center
|
||||
|
||||
Selecting 'Annotate udhcpc_main', we get a detailed listing of
|
||||
percentages by instruction for the udhcpc_main function. From the
|
||||
display, we can see that over 50% of the time spent in this function is
|
||||
taken up by a couple tests and the move of a constant (1) to a register:
|
||||
|
||||
.. image:: figures/perf-wget-busybox-annotate-udhcpc.png
|
||||
:align: center
|
||||
|
||||
As a segue into tracing, let's try another profile using a different
|
||||
counter, something other than the default 'cycles'.
|
||||
|
||||
@@ -424,6 +454,9 @@ and tell perf to do a profile using it as the sampling event:
|
||||
root@crownbay:~# perf record -g -e sched:sched_wakeup wget
|
||||
http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2
|
||||
|
||||
.. image:: figures/sched-wakeup-profile.png
|
||||
:align: center
|
||||
|
||||
The screenshot above shows the results of running a profile using
|
||||
sched:sched_switch tracepoint, which shows the relative costs of various
|
||||
paths to sched_wakeup (note that sched_wakeup is the name of the
|
||||
@@ -653,6 +686,9 @@ linux-2.6.19.2.tar.b 100% \|*******************************\| 41727k
|
||||
0:00:00 ETA Here we see entries not only for our wget load, but for
|
||||
other processes running on the system as well:
|
||||
|
||||
.. image:: figures/perf-systemwide.png
|
||||
:align: center
|
||||
|
||||
In the snapshot above, we can see callchains that originate in libc, and
|
||||
a callchain from Xorg that demonstrates that we're using a proprietary X
|
||||
driver in userspace (notice the presence of 'PVR' and some other
|
||||
@@ -665,6 +701,9 @@ record a profile: root@crownbay:~# perf record -g -a -e cycles:u ^C[
|
||||
perf record: Woken up 2 times to write data ] [ perf record: Captured
|
||||
and wrote 0.376 MB perf.data (~16443 samples) ]
|
||||
|
||||
.. image:: figures/perf-report-cycles-u.png
|
||||
:align: center
|
||||
|
||||
Notice in the screenshot above, we see only userspace entries ([.])
|
||||
|
||||
Finally, we can press 'enter' on a leaf node and select the 'Zoom into
|
||||
@@ -672,6 +711,9 @@ DSO' menu item to show only entries associated with a specific DSO. In
|
||||
the screenshot below, we've zoomed into the 'libc' DSO which shows all
|
||||
the entries associated with the libc-xxx.so DSO.
|
||||
|
||||
.. image:: figures/perf-systemwide-libc.png
|
||||
:align: center
|
||||
|
||||
We can also use the system-wide -a switch to do system-wide tracing.
|
||||
Here we'll trace a couple of scheduler events: root@crownbay:~# perf
|
||||
record -a -e sched:sched_switch -e sched:sched_wakeup ^C[ perf record:
|
||||
@@ -838,6 +880,9 @@ gthumb 1300 [001] 34219.264551: do_fork: (c1028460) pcmanfm 1296 [000]
|
||||
(c1028460) And using 'perf report' on the same file, we can see the
|
||||
callgraphs from starting a few programs during those 30 seconds:
|
||||
|
||||
.. image:: figures/perf-probe-do_fork-profile.png
|
||||
:align: center
|
||||
|
||||
.. container:: informalexample
|
||||
|
||||
Tying it Together:
|
||||
@@ -1312,6 +1357,9 @@ choosing from the kernelshark menu: Capture \| Record That will display
|
||||
the following dialog, which allows you to choose one or more events (or
|
||||
even one or more complete subsystems) to trace:
|
||||
|
||||
.. image:: figures/kernelshark-choose-events.png
|
||||
:align: center
|
||||
|
||||
Note that these are exactly the same sets of events described in the
|
||||
previous trace events subsystem section, and in fact is where trace-cmd
|
||||
gets them for kernelshark.
|
||||
@@ -1324,6 +1372,9 @@ After doing that, we can start and stop the trace using the 'Run' and
|
||||
'Stop' button on the lower right corner of the dialog (the same button
|
||||
will turn into the 'Stop' button after the trace has started):
|
||||
|
||||
.. image:: figures/kernelshark-output-display.png
|
||||
:align: center
|
||||
|
||||
Notice that the right-hand pane shows the exact trace-cmd command-line
|
||||
that's used to run the trace, along with the results of the trace-cmd
|
||||
run.
|
||||
@@ -1332,9 +1383,15 @@ Once the 'Stop' button is pressed, the graphical view magically fills up
|
||||
with a colorful per-cpu display of the trace data, along with the
|
||||
detailed event listing below that:
|
||||
|
||||
.. image:: figures/kernelshark-i915-display.png
|
||||
:align: center
|
||||
|
||||
Here's another example, this time a display resulting from tracing 'all
|
||||
events':
|
||||
|
||||
.. image:: figures/kernelshark-all.png
|
||||
:align: center
|
||||
|
||||
The tool is pretty self-explanatory, but for more detailed information
|
||||
on navigating through the data, see the `kernelshark
|
||||
website <http://rostedt.homelinux.com/kernelshark/>`__.
|
||||
@@ -1543,6 +1600,9 @@ press the 'Profile' button.
|
||||
Once you've pressed the profile button, the three panes will fill up
|
||||
with profiling data:
|
||||
|
||||
.. image:: figures/sysprof-copy-to-user.png
|
||||
:align: center
|
||||
|
||||
The left pane shows a list of functions and processes. Selecting one of
|
||||
those expands that function in the right pane, showing all its callees.
|
||||
Note that this caller-oriented display is essentially the inverse of
|
||||
@@ -1554,6 +1614,9 @@ looking up the callchain we can see that one of the callers of
|
||||
Notice that this is essentially a portion of the same information we saw
|
||||
in the perf display shown in the perf section of this page.
|
||||
|
||||
.. image:: figures/sysprof-copy-from-user.png
|
||||
:align: center
|
||||
|
||||
Similarly, the above is a snapshot of the Sysprof display of a
|
||||
copy-from-user callchain.
|
||||
|
||||
@@ -1562,6 +1625,9 @@ a list of all the callers of a particular function selected in the top
|
||||
left pane. In this case, the lower pane is showing all the callers of
|
||||
\__mark_inode_dirty:
|
||||
|
||||
.. image:: figures/sysprof-callers.png
|
||||
:align: center
|
||||
|
||||
Double-clicking on one of those functions will in turn change the focus
|
||||
to the selected function, and so on.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user