gprof2dot下载 - gprof2dot源代码下载

关于gprof2dot

这是一个 Python 脚本，用于将许多分析器的输出转换为点图。

它可以：

读取输出：
- Linux性能
- Valgrind 的 callgrind 工具
- O型材
- 系统教授
- Xperf
- VTune
- 很困
- Python 分析器
- Java 的 HPROF
- 教授、教授
- DTrace
- 来自 FlameGraph 的 stackcollapse
修剪低于特定阈值的节点和边；
使用启发式方法在相互递归函数内传播时间；
有效地使用颜色来吸引人们对热点的注意；
可在任何可用 Python 和 Graphviz 的平台上工作，即几乎在任何地方；
比较具有几乎相同结构的两个图，以分析性能指标，例如时间或函数调用。

如果您想要一个交互式查看器来查看gprof2dot生成的图表，请检查 xdot.py。

地位

gprof2dot目前可以满足我的需求，但我很少或根本没有时间对其进行维护。因此，我担心任何请求的功能都不太可能实现，并且我可能会缓慢处理问题报告或拉取请求。

例子

这是 Linux Gazette 文章中示例数据的结果，使用默认设置：

要求

Python：已知可使用 >=3.8 版本；它很可能不适用于早期版本。
Graphviz：使用版本 2.26.3 进行测试，但在其他版本上应该可以正常工作。

Windows用户

下载并安装适用于 Windows 的 Python
下载并安装 Windows 版 Graphviz

Linux用户

在 Debian/Ubuntu 上运行：

 apt-get install python3 graphviz

在 RedHat/Fedora 上运行

 yum install python3 graphviz

下载

皮伊
```
 pip install gprof2dot
```
独立脚本
Git 存储库

文档

用法

 Usage: 
	gprof2dot.py [options] [file] ...

Options:
  -h, --help            show this help message and exit
  -o FILE, --output=FILE
                        output filename [stdout]
  -n PERCENTAGE, --node-thres=PERCENTAGE
                        eliminate nodes below this threshold [default: 0.5]
  -e PERCENTAGE, --edge-thres=PERCENTAGE
                        eliminate edges below this threshold [default: 0.1]
  -f FORMAT, --format=FORMAT
                        profile format: axe, callgrind, collapse, dtrace,
                        hprof, json, oprofile, perf, prof, pstats, sleepy,
                        sysprof or xperf [default: prof]
  --total=TOTALMETHOD   preferred method of calculating total time: callratios
                        or callstacks (currently affects only perf format)
                        [default: callratios]
  -c THEME, --colormap=THEME
                        color map: bw, color, gray, pink or print [default:
                        color]
  -s, --strip           strip function parameters, template parameters, and
                        const modifiers from demangled C++ function names
  --color-nodes-by-selftime
                        color nodes by self time, rather than by total time
                        (sum of self and descendants)
  -w, --wrap            wrap function names
  --show-samples        show function samples
  --node-label=MEASURE  measurements to on show the node (can be specified
                        multiple times): self-time, self-time-percentage,
                        total-time or total-time-percentage [default: total-
                        time-percentage, self-time-percentage]
  --list-functions=LIST_FUNCTIONS
                        list functions available for selection in -z or -l,
                        requires selector argument ( use '+' to select all).
                        Recall that the selector argument is used with
                        Unix/Bash globbing/pattern matching, and that entries
                        are formatted '<pkg>:<linenum>:<function>'. When
                        argument starts with '%', a dump of all available
                        information is performed for selected entries,  after
                        removal of leading '%'.
  -z ROOT, --root=ROOT  prune call graph to show only descendants of specified
                        root function
  -l LEAF, --leaf=LEAF  prune call graph to show only ancestors of specified
                        leaf function
  --depth=DEPTH         prune call graph to show only descendants or ancestors
                        until specified depth
  --skew=THEME_SKEW     skew the colorization curve.  Values < 1.0 give more
                        variety to lower percentages.  Values > 1.0 give less
                        variety to lower percentages
  -p FILTER_PATHS, --path=FILTER_PATHS
                       Filter all modules not in a specified path
  --compare             Compare two graphs with almost identical structure. With this
                        option two files should be provided.gprof2dot.py
                        [options] --compare [file1] [file2] ...
  --compare-tolerance=TOLERANCE
                        Tolerance threshold for node difference
                        (default=0.001%).If the difference is below this value
                        the nodes are considered identical.
  --compare-only-slower
                        Display comparison only for function which are slower
                        in second graph.
  --compare-only-faster
                        Display comparison only for function which are faster
                        in second graph.
  --compare-color-by-difference
                        Color nodes based on the value of the difference.
                        Nodes with the largest differences represent the hot
                        spots.

示例

Linux性能

 perf record -g -- /path/to/your/executable
perf script | c++filt | gprof2dot.py -f perf | dot -Tpng -o output.png

个人档案

 opcontrol --callgraph=16
opcontrol --start
/path/to/your/executable arg1 arg2
opcontrol --stop
opcontrol --dump
opreport -cgf | gprof2dot.py -f oprofile | dot -Tpng -o output.png

性能

如果您不熟悉 xperf，请先阅读这篇优秀的文章。然后做：

启动 xperf 作为
```
 xperf -on Latency -stackwalk profile
```
运行您的应用程序。
保存数据。 ` xperf -d 输出.etl
启动可视化工具：
```
 xperf output.etl
```
在“跟踪”菜单中，选择“加载符号” 。如有必要，配置符号路径。
在CPU 采样图上选择感兴趣的区域，单击鼠标右键，然后选择汇总表。
在“列”菜单中，确保“堆栈”列已启用且可见。
右键单击一行，选择Export Full Table ，然后保存到output.csv 。

然后调用 gprof2dot 作为

 gprof2dot.py -f xperf output.csv | dot -Tpng -o output.png

VTune 放大器 XE

收集配置文件数据（也可以通过 GUI 完成）：

 amplxe-cl -collect hotspots -result-dir output -- your-app

将配置文件数据可视化为：

 amplxe-cl -report gprof-cc -result-dir output -format text -report-output output.txt
gprof2dot.py -f axe output.txt | dot -Tpng -o output.png

另请参阅基里尔·罗戈任 (Kirill Rogozhin) 的博客文章。

通用教授

 /path/to/your/executable arg1 arg2
gprof path/to/your/executable | gprof2dot.py | dot -Tpng -o output.png

蟒蛇简介

 python -m profile -o output.pstats path/to/your/script arg1 arg2
gprof2dot.py -f pstats output.pstats | dot -Tpng -o output.png

python cProfile（以前称为 lsprof）

 python -m cProfile -o output.pstats path/to/your/script arg1 arg2
gprof2dot.py -f pstats output.pstats | dot -Tpng -o output.png

Java HPROF

 java -agentlib:hprof=cpu=samples ...
gprof2dot.py -f hprof java.hprof.txt | dot -Tpng -o output.png

有关详细信息，请参阅 Russell Power 的博客文章。

DTrace

 dtrace -x ustackframes=100 -n 'profile-97 /pid == 12345/ { @[ustack()] = count(); } tick-60s { exit(0); }' -o out.user_stacks
gprof2dot.py -f dtrace out.user_stacks | dot -Tpng -o output.png

# Notice: sometimes, the dtrace outputs format may be latin-1, and gprof2dot will fail to parse it.
# To solve this problem, you should use iconv to convert to UTF-8 explicitly.
# TODO: add an encoding flag to tell gprof2dot how to decode the profile file.
iconv -f ISO-8859-1 -t UTF-8 out.user_stacks | gprof2dot.py -f dtrace

堆栈崩溃

Brendan Gregg 的 FlameGraph 工具将每个样本包含一行的文本文件作为输入。这种格式可以使用 FlameGraph 存储库中的stackcollapse脚本从各种其他输入生成。也可以通过py-spy等工具生成。

用法示例：

性能

 perf record -g -- /path/to/your/executable
perf script | FlameGraph/stackcollapse-perf.pl > out.collapse
gprof2dot.py -f collapse out.collapse | dot -Tpng -o output.png

间谍

 py-spy record -p <pidfile> -f raw -o out.collapse
gprof2dot.py -f collapse out.collapse | dot -Tpng -o output.png

比较示例

此图展示了--compare和--compare-color-by-difference选项的示例用法。

指向右侧的箭头表示在作为第二个配置文件（第二个配置文件）提供的配置文件中函数执行速度更快的节点，而指向左侧的箭头表示在作为第一个配置文件（第一个配置文件）提供的配置文件中函数执行速度更快的节点。

节点

 +-----------------------------+
|        function name          
| total time %  -/+ total_diff   
| ( self time % ) -/+ self_diff  /
| total calls1 / total calls2   /
+-----------------------------+

在哪里

total time %和self time %来自第一个配置文件
diff计算为time in the first profile - time in the second profile的绝对值。

注意比较选项已针对 pstats、ax 和 callgrind 配置文件进行了测试。

输出

输出图中的节点代表一个函数并具有以下布局：

 +------------------------------+
|        function name         |
| total time % ( self time % ) |
|         total calls          |
+------------------------------+

在哪里：

总时间%是该函数及其所有子函数所花费的运行时间的百分比；
self time %是该函数单独花费的运行时间的百分比；
Total Calls是调用该函数的总次数（包括递归调用）。

边代表两个函数之间的调用，并具有以下布局：

           total time %
              calls
parent --------------------> children

在哪里：

总时间 %是从子级转移到该父级的运行时间的百分比（如果有）；
Calls是父函数调用子函数的次数。

请注意，在递归循环中，节点中的总时间％对于循环中的整个函数是相同的，并且循环内的边缘中没有总时间％数字，因为这样的数字没有意义。

节点和边的颜色根据总时间百分比值而变化。在默认的类似温度的颜色图中，花费最多时间的函数（热点）被标记为饱和红色，花费很少时间的函数被标记为深蓝色。请注意，默认情况下，花费时间可以忽略不计或不花费时间的函数不会出现在图中。

列表功能

标志--list-functions允许列出在gprof输入中找到的函数条目。这是一个准备使用--leaf ( -l ) 或--root ( -z ) 标志的工具。

 prof2dot.py -f pstats /tmp/myLog.profile  --list-functions "test_segments:*:*" 
  
test_segments:5:<module>,
test_segments:206:TestSegments,
test_segments:46:<lambda>

选择器参数与 Unix/Bash 通配/模式匹配一起使用，其方式与-l和-z标志执行的方式相同。
条目的格式为“<pkg>:<linenum>:<function>”。
当选择器参数以“%”开头时，在删除选择器的前导“%”后，将对所选条目执行所有可用信息的转储。如果选择器是“+”或“*”，则打印完整的函数列表。

常见问题解答

如何生成完整的调用图？

默认情况下， gprof2dot.py生成部分调用图，排除对总计算时间影响很小或没有影响的节点和边。如果您想要完整的调用图，请通过-n / --node-thres和-e / --edge-thres选项为节点和边设置零阈值，如下所示：

 gprof2dot.py -n0 -e0

节点标签太宽。我怎样才能缩小范围？

在分析 C++ 代码时，由于在分解的 C++ 函数名称中包含作用域、函数参数和模板参数，节点标签可能会变得非常宽。

如果不需要函数和模板参数信息，请传递-s / --strip选项来删除它们。

如果您想保留所有这些信息，或者标签仍然太宽，那么您可以传递-w / --wrap来包装标签。请注意，由于dot不会自动换行标签，因此标签边距不会完美对齐。

为什么没有输出，或者都是相同的颜色？

总执行时间可能太短，因此配置文件中没有足够的精度来确定时间花在哪里。

您仍然可以通过-n / --node-thres和-e / --edge-thres选项设置节点和边的零阈值来强制显示整个图，如下所示：

 gprof2dot.py -n0 -e0

但为了获得有意义的结果，您需要找到一种方法来运行更长时间的程序（聚合多次运行的结果）。

为什么百分比相加不起来？

您的执行时间可能太短，导致舍入误差很大。

有关增加执行时间的方法，请参阅上面的问题。

编译进行分析时应将哪些选项传递给 gcc？

产生合适结果所必需的选项有：

-g ：产生调试信息
-fno-omit-frame-pointer ：使用帧指针（在某些体系结构（如 x86_64 和某些优化级别）中默认禁用帧指针使用；没有它就不可能遍历调用堆栈）

如果您使用 gprof，您还需要-pg选项，但现在您可以使用其他分析工具获得更好的结果，其中大多数在编译时不需要特殊的代码检测。

您希望正在分析的代码尽可能接近您将要发布的代码。因此，您应该包含在发布代码中使用的所有选项，通常是：

-O2 ：不涉及空间速度权衡的优化
-DNDEBUG ：禁用标准库中的调试代码（例如assert宏）

然而，gcc 执行的许多优化会干扰分析结果的准确性/粒度。您应该传递这些选项来禁用这些特定的优化：

-fno-inline-functions ：不要将函数内联到其父函数中（否则花在这些函数上的时间将归因于调用者）
-fno-inline-functions-called-once ：与上面类似
-fno-optimize-sibling-calls ：不优化同级和尾部递归调用（否则尾部调用可能归因于父函数）

如果粒度仍然太低，您可以通过这些选项来实现更细的粒度：

-fno-default-inline ：默认情况下，不要仅仅因为成员函数是在类作用域内定义的而将它们设置为内联
-fno-inline ：不要注意 inline 关键字，但是请注意，使用最后这些选项时，由于函数调用开销，多次调用的函数的计时将被扭曲。对于典型的 C++ 代码来说尤其如此，这些代码希望通过这些优化来获得良好的性能。

有关详细信息，请参阅 gcc 优化选项的完整列表。

链接

请参阅 wiki 以获取外部资源，包括补充/替代工具。

展开