gprof2dot下載 - gprof2dot原始碼下載

關於gprof2dot

這是一個 Python 腳本，用於將許多分析器的輸出轉換為點圖。

它可以：

讀取輸出：
- Linux效能
- Valgrind 的 callgrind 工具
- O型材
- 系統教授
- Xperf
- VTune
- 很睏
- Python 分析器
- Java 的 HPROF
- 教授、教授
- DTrace
- 來自 FlameGraph 的 stackcollapse
修剪低於特定閾值的節點和邊；
使用啟發式方法在相互遞歸函數內傳播時間；
有效地使用顏色來吸引人們對熱點的注意；
可在任何可用 Python 和 Graphviz 的平台上運作，即幾乎在任何地方；
比較具有幾乎相同結構的兩個圖，以分析效能指標，例如時間或函數呼叫。

如果您想要一個互動式檢視器來查看gprof2dot產生的圖表，請檢查 xdot.py。

地位

gprof2dot目前可以滿足我的需求，但我很少或根本沒有時間維護。因此，我擔心任何請求的功能都不太可能實現，我可能會緩慢處理問題報告或拉取請求。

例子

這是 Linux Gazette 文章中範例資料的結果，使用預設設定：

要求

Python：已知可使用 >=3.8 版本；它很可能不適用於早期版本。
Graphviz：使用版本 2.26.3 進行測試，但在其他版本上應該可以正常運作。

Windows 使用者

下載並安裝適用於 Windows 的 Python
下載並安裝 Windows 版 Graphviz

Linux用戶

在 Debian/Ubuntu 上運行：

 apt-get install python3 graphviz

在 RedHat/Fedora 上運行

 yum install python3 graphviz

下載

皮伊
```
 pip install gprof2dot
```
獨立腳本
Git 儲存庫

文件

用法

 Usage: 
	gprof2dot.py [options] [file] ...

Options:
  -h, --help            show this help message and exit
  -o FILE, --output=FILE
                        output filename [stdout]
  -n PERCENTAGE, --node-thres=PERCENTAGE
                        eliminate nodes below this threshold [default: 0.5]
  -e PERCENTAGE, --edge-thres=PERCENTAGE
                        eliminate edges below this threshold [default: 0.1]
  -f FORMAT, --format=FORMAT
                        profile format: axe, callgrind, collapse, dtrace,
                        hprof, json, oprofile, perf, prof, pstats, sleepy,
                        sysprof or xperf [default: prof]
  --total=TOTALMETHOD   preferred method of calculating total time: callratios
                        or callstacks (currently affects only perf format)
                        [default: callratios]
  -c THEME, --colormap=THEME
                        color map: bw, color, gray, pink or print [default:
                        color]
  -s, --strip           strip function parameters, template parameters, and
                        const modifiers from demangled C++ function names
  --color-nodes-by-selftime
                        color nodes by self time, rather than by total time
                        (sum of self and descendants)
  -w, --wrap            wrap function names
  --show-samples        show function samples
  --node-label=MEASURE  measurements to on show the node (can be specified
                        multiple times): self-time, self-time-percentage,
                        total-time or total-time-percentage [default: total-
                        time-percentage, self-time-percentage]
  --list-functions=LIST_FUNCTIONS
                        list functions available for selection in -z or -l,
                        requires selector argument ( use '+' to select all).
                        Recall that the selector argument is used with
                        Unix/Bash globbing/pattern matching, and that entries
                        are formatted '<pkg>:<linenum>:<function>'. When
                        argument starts with '%', a dump of all available
                        information is performed for selected entries,  after
                        removal of leading '%'.
  -z ROOT, --root=ROOT  prune call graph to show only descendants of specified
                        root function
  -l LEAF, --leaf=LEAF  prune call graph to show only ancestors of specified
                        leaf function
  --depth=DEPTH         prune call graph to show only descendants or ancestors
                        until specified depth
  --skew=THEME_SKEW     skew the colorization curve.  Values < 1.0 give more
                        variety to lower percentages.  Values > 1.0 give less
                        variety to lower percentages
  -p FILTER_PATHS, --path=FILTER_PATHS
                       Filter all modules not in a specified path
  --compare             Compare two graphs with almost identical structure. With this
                        option two files should be provided.gprof2dot.py
                        [options] --compare [file1] [file2] ...
  --compare-tolerance=TOLERANCE
                        Tolerance threshold for node difference
                        (default=0.001%).If the difference is below this value
                        the nodes are considered identical.
  --compare-only-slower
                        Display comparison only for function which are slower
                        in second graph.
  --compare-only-faster
                        Display comparison only for function which are faster
                        in second graph.
  --compare-color-by-difference
                        Color nodes based on the value of the difference.
                        Nodes with the largest differences represent the hot
                        spots.

範例

Linux效能

 perf record -g -- /path/to/your/executable
perf script | c++filt | gprof2dot.py -f perf | dot -Tpng -o output.png

個人檔案

 opcontrol --callgraph=16
opcontrol --start
/path/to/your/executable arg1 arg2
opcontrol --stop
opcontrol --dump
opreport -cgf | gprof2dot.py -f oprofile | dot -Tpng -o output.png

效能

如果您不熟悉 xperf，請先閱讀這篇優秀的文章。然後做：

啟動 xperf 作為
```
 xperf -on Latency -stackwalk profile
```
運行您的應用程式。
保存資料。 ` xperf -d 輸出.etl
啟動視覺化工具：
```
 xperf output.etl
```
在「追蹤」選單中，選擇「載入符號」 。如有必要，配置符號路徑。
在CPU 取樣圖上選擇感興趣的區域，按一下滑鼠右鍵，然後選擇總計表。
在「列」功能表中，確保「堆疊」列已啟用且可見。
右鍵點選一行，選擇Export Full Table ，然後儲存到output.csv 。

然後呼叫 gprof2dot 作為

 gprof2dot.py -f xperf output.csv | dot -Tpng -o output.png

VTune 擴大機 XE

收集設定檔資料（也可以透過 GUI 完成）：

 amplxe-cl -collect hotspots -result-dir output -- your-app

將設定檔資料視覺化為：

 amplxe-cl -report gprof-cc -result-dir output -format text -report-output output.txt
gprof2dot.py -f axe output.txt | dot -Tpng -o output.png

另請參閱 Kirill Rogozhin 的部落格文章。

一般教授

 /path/to/your/executable arg1 arg2
gprof path/to/your/executable | gprof2dot.py | dot -Tpng -o output.png

蟒蛇簡介

 python -m profile -o output.pstats path/to/your/script arg1 arg2
gprof2dot.py -f pstats output.pstats | dot -Tpng -o output.png

python cProfile（以前稱為 lsprof）

 python -m cProfile -o output.pstats path/to/your/script arg1 arg2
gprof2dot.py -f pstats output.pstats | dot -Tpng -o output.png

Java HPROF

 java -agentlib:hprof=cpu=samples ...
gprof2dot.py -f hprof java.hprof.txt | dot -Tpng -o output.png

有關詳細信息，請參閱 Russell Power 的部落格文章。

DTrace

 dtrace -x ustackframes=100 -n 'profile-97 /pid == 12345/ { @[ustack()] = count(); } tick-60s { exit(0); }' -o out.user_stacks
gprof2dot.py -f dtrace out.user_stacks | dot -Tpng -o output.png

# Notice: sometimes, the dtrace outputs format may be latin-1, and gprof2dot will fail to parse it.
# To solve this problem, you should use iconv to convert to UTF-8 explicitly.
# TODO: add an encoding flag to tell gprof2dot how to decode the profile file.
iconv -f ISO-8859-1 -t UTF-8 out.user_stacks | gprof2dot.py -f dtrace

堆疊崩潰

Brendan Gregg 的 FlameGraph 工具將每個樣本包含一行的文字檔案作為輸入。這種格式可以使用 FlameGraph 儲存庫中的stackcollapse腳本從各種其他輸入產生。也可以透過py-spy等工具產生。

用法範例：

效能

 perf record -g -- /path/to/your/executable
perf script | FlameGraph/stackcollapse-perf.pl > out.collapse
gprof2dot.py -f collapse out.collapse | dot -Tpng -o output.png

間諜

 py-spy record -p <pidfile> -f raw -o out.collapse
gprof2dot.py -f collapse out.collapse | dot -Tpng -o output.png

比較範例

此圖展示了--compare和--compare-color-by-difference選項的範例用法。

指向右側的箭頭表示在作為第二個設定檔（第二個設定檔）提供的設定檔中函數執行速度較快的節點，而指向左側的箭頭表示在作為第一個設定檔（第一個設定檔）提供的設定檔中函數執行速度較快的節點。

節點

 +-----------------------------+
|        function name          
| total time %  -/+ total_diff   
| ( self time % ) -/+ self_diff  /
| total calls1 / total calls2   /
+-----------------------------+

在哪裡

total time %和self time %來自第一個配置文件
diff計算為time in the first profile - time in the second profile的絕對值。

注意比較選項已針對 pstats、ax 和 callgrind 設定檔進行了測試。

輸出

輸出圖中的節點代表一個函數並具有以下佈局：

 +------------------------------+
|        function name         |
| total time % ( self time % ) |
|         total calls          |
+------------------------------+

在哪裡：

總時間%是該函數及其所有子函數所花費的運行時間的百分比；
self time %是該函數單獨花費的運行時間的百分比；
Total Calls是呼叫該函數的總次數（包括遞歸呼叫）。

邊代表兩個函數之間的調用，並具有以下佈局：

           total time %
              calls
parent --------------------> children

在哪裡：

總時間 %是從子級轉移到該父級的運行時間的百分比（如果有）；
Calls是父函數呼叫子函數的次數。

請注意，在遞歸循環中，節點中的總時間％對於循環中的整個函數是相同的，並且循環內的邊緣中沒有總時間％數字，因為這樣的數字沒有意義。

節點和邊的顏色會根據總時間百分比值而變化。在預設的類似溫度的顏色圖中，花費最多時間的函數（熱點）被標記為飽和紅色，花費很少時間的函數被標記為深藍色。請注意，預設情況下，花費時間可以忽略不計或不花費時間的函數不會出現在圖中。

清單功能

標誌--list-functions允許列出在gprof輸入中找到的函數條目。這是一個準備使用--leaf ( -l ) 或--root ( -z ) 標誌的工具。

 prof2dot.py -f pstats /tmp/myLog.profile  --list-functions "test_segments:*:*" 
  
test_segments:5:<module>,
test_segments:206:TestSegments,
test_segments:46:<lambda>

選擇器參數與 Unix/Bash 通配/模式匹配一個使用，其方式與-l和-z標誌執行的方式相同。
條目的格式為「<pkg>:<linenum>:<function>」。
當選擇器參數以“%”開頭時，在刪除選擇器的前導“%”後，將對所選條目執行所有可用資訊的轉儲。如果選擇器是“+”或“*”，則列印完整的函數清單。

常見問題解答

如何產生完整的呼叫圖？

預設情況下， gprof2dot.py產生部分呼叫圖，排除對總計算時間影響很小或沒有影響的節點和邊。如果您想要完整的呼叫圖，請透過-n / --node-thres和-e / --edge-thres選項為節點和邊設定零閾值，如下所示：

 gprof2dot.py -n0 -e0

節點標籤太寬。我怎樣才能縮小範圍？

在分析 C++ 程式碼時，由於在分解的 C++ 函數名稱中包含作用域、函數參數和模板參數，節點標籤可能會變得非常寬。

如果不需要函數和模板參數訊息，請傳遞-s / --strip選項來刪除它們。

如果您想保留所有這些訊息，或者標籤仍然太寬，那麼您可以傳遞-w / --wrap來包裝標籤。請注意，由於dot不會自動換行標籤，因此標籤邊距不會完美對齊。

為什麼沒有輸出，或都是相同的顏色？

總執行時間可能太短，因此設定檔中沒有足夠的精度來確定時間花在哪裡。

您仍然可以透過-n / --node-thres和-e / --edge-thres選項來設定節點和邊的零閾值來強制顯示整個圖，如下所示：

 gprof2dot.py -n0 -e0

但為了獲得有意義的結果，您需要找到一種方法來運行更長時間的程式（聚合多次運行的結果）。

為什麼百分比相加不起來？

您的執行時間可能太短，導致舍入誤差很大。

有關增加執行時間的方法，請參閱上面的問題。

編譯進行分析時應將哪些選項傳遞給 gcc？

產生合適結果所需的選項有：

-g ：產生除錯訊息
-fno-omit-frame-pointer ：使用幀指標（在某些架構（如 x86_64 和某些最佳化等級）中預設禁用幀指標使用；沒有它就不可能遍歷呼叫堆疊）

如果您使用 gprof，您還需要-pg選項，但現在您可以使用其他分析工具來獲得更好的結果，其中大多數在編譯時不需要特殊的程式碼檢測。

您希望正在分析的程式碼盡可能接近您將要發布的程式碼。因此，您應該包含在發布程式碼中使用的所有選項，通常是：

-O2 ：不涉及空間速度權衡的最佳化
-DNDEBUG ：停用標準庫中的偵錯程式碼（例如assert宏）

然而，gcc 執行的許多最佳化會幹擾分析結果的準確性/粒度。您應該傳遞這些選項來停用這些特定的最佳化：

-fno-inline-functions ：不要將函數內聯到其父函數中（否則花在這些函數上的時間將歸因於呼叫者）
-fno-inline-functions-called-once ：與上方類似
-fno-optimize-sibling-calls ：不最佳化同級和尾部遞歸呼叫（否則尾部呼叫可能歸因於父函數）

如果粒度仍然太低，您可以透過這些選項來實現更細的粒度：

-fno-default-inline ：預設情況下，不要僅僅因為成員函數是在類別作用域內定義的而將它們設為內聯
-fno-inline ：不要注意 inline 關鍵字，但請注意，使用最後這些選項時，由於函數呼叫開銷，多次呼叫的函數的計時將被扭曲。對於典型的 C++ 程式碼尤其如此，這些程式碼希望透過這些最佳化來獲得良好的效能。

有關詳細信息，請參閱 gcc 優化選項的完整列表。

連結

請參閱 wiki 以取得外部資源，包括補充/替代工具。

展開