How to investigate slow compile times?
Given that compile/build times are a massive problem in large C++ codebases, it’s amazing how little information we can get about why they are slow. For example when using Visual Studio C++ compiler, about the only options seem to be:
to get time spent in frontend (
c1xx.dll) vs backend (
c2.dll) compiler parts. This is good to narrow down
whether slow compiles are due to too many things being parsed (e.g. a problem of including too many header files), which
is typically a frontend problem, or is the problem some sort of language constructs that cause the backend (typically
the optimizer part of the compiler) take “a lot of time”.
The “common knowledge” is that long C++ compile times are caused by too many header files being included, and in general
#include model. That is often the case, but sometimes it’s the backend part of the compiler that takes “forever”
to compile something (e.g. the
__forceinline compile time issue
I found in my previous post). The backend part typically only takes “long” when compile optimizations are on.
2) Take a profiler capture of
cl.exe, using any of existing native code profiling tools (e.g.
xperf), load symbols, and try to guess what is causing the
long compile time based on profiling results.
3) Just do guesses and kinda randomly change code structure and see whether it helped.
Well, turns out there’s another option, the
MSVC /d2cgsummary flag
I can’t find anything about this flag on the internet, so I’d assume it’s something “for internal use only”,
and may or might not exist or be changed in any MSVC compiler version. It does seem to exist & work in VS2015 Update 3
and in VS2017. Passing it to
cl.exe will give an output of “Anomalistic Compile Times” and some sort of “Caching Stats”, like this:
Code Generation Summary Total Function Count: 742 Elapsed Time: 13.756 sec Total Compilation Time: 13.751 sec Efficiency: 100.0% Average time per function: 0.019 sec Anomalistic Compile Times: 1 ?EvaluateRootMotion@animation@mecanim@@YAXAEBUClipMuscleConstant@12@AEBUClipMuscleInput@12@PEBM2AEAUMotionOutput@12@_N@Z: 11.667 sec, 0 instrs Serialized Initializer Count: 4 Serialized Initializer Time: 0.006 sec RdrReadProc Caching Stats Functions Cached: 61 Retrieved Count: 20162 Abandoned Retrieval Count: 0 Abandoned Caching Count: 14 Wasted Caching Attempts: 0 Functions Retrieved at Least Once: 61 Functions Cached and Never Retrieved: 0 Most Hits: ?f@?$load@$00@?$SWIZ@$0DCB@@v4f@meta@math@@SA?AT__m128@@T6@@Z: 2775 ?f@?$SWIZ@$0DCB@@v4f@meta@math@@SA?AT__m128@@T5@@Z: 2774 ?f@?$load@$00@?$SWIZ@$0EDCB@@v4f@meta@math@@SA?AT__m128@@T6@@Z: 1924 ?f@?$SWIZ@$0EDCB@@v4f@meta@math@@SA?AT__m128@@T5@@Z: 1920 ?pack@?$v@Uv4f@meta@math@@U?$sp@Uv4f@meta@math@@$0DCB@$02@23@$02@meta@math@@SA?AU123@AEBT__m128@@@Z: 1296 ??B_float4@math@@QEBA?AT__m128@@XZ: 1165
Anomalistic Compile Times seems to output “longest to compile” functions, where I guess their times are much longer
than the average. Here, it immediately says that among all 700+ functions that were compiled, there is one that takes
almost all the time! The names are mangled,
but using either the
undname.exe MSVC utility or an online demangler, it says the
slow-to-compile function in this particular file was called
This means that if I want to speed up compilation of this file, I need to only look at that one function, and somehow figure out why it’s slow to compile. The “why” is still an exercise for the reader, but at least I know I don’t have to investigate 700 other functions. Great!
RdrReadProc Caching Stats section I don’t quite understand, but it seems to be listing functions that were used
most often in the whole compilation. Again after demangling, seems that
math::meta::v4f::SWIZ<801>::load got used
almost 3000 times, etc.
On some other files I was looking into, the
/d2cgsummary flag did not point to one slow function, but at least pointed
me to a subsystem that might be contributing to slow compile times. For example here:
Code Generation Summary Total Function Count: 4260 Elapsed Time: 20.710 sec Total Compilation Time: 20.685 sec Efficiency: 99.9% Average time per function: 0.005 sec Anomalistic Compile Times: 32 ?VirtualRedirectTransfer@Shader@@UEAAXAEAVRemapPPtrTransfer@@@Z: 0.741 sec, 0 instrs ??$TransferBase@VShader@@@RemapPPtrTransfer@@QEAAXAEAVShader@@W4TransferMetaFlags@@@Z: 0.734 sec, 0 instrs ??$Transfer@VShader@@@RemapPPtrTransfer@@QEAAXAEAVShader@@PEBDW4TransferMetaFlags@@@Z: 0.726 sec, 0 instrs ??$Transfer@VRemapPPtrTransfer@@@Shader@@QEAAXAEAVRemapPPtrTransfer@@@Z: 0.711 sec, 0 instrs ??$Transfer@VRemapPPtrTransfer@@@?$SerializeTraits@VShader@@@@SAXAEAVShader@@AEAVRemapPPtrTransfer@@@Z: 0.708 sec, 0 instrs ??$Transfer@USerializedShader@ShaderLab@@@RemapPPtrTransfer@@QEAAXAEAUSerializedShader@ShaderLab@@PEBDW4TransferMetaFlags@@@Z: 0.632 sec, 0 instrs ??$Transfer@VRemapPPtrTransfer@@@?$SerializeTraits@USerializedShader@ShaderLab@@@@SAXAEAUSerializedShader@ShaderLab@@AEAVRemapPPtrTransfer@@@Z: 0.625 sec, 0 instrs ??$Transfer@VRemapPPtrTransfer@@@SerializedShader@ShaderLab@@QEAAXAEAVRemapPPtrTransfer@@@Z: 0.621 sec, 0 instrs <...> RdrReadProc Caching Stats Functions Cached: 227 Retrieved Count: 74070 Abandoned Retrieval Count: 0 Abandoned Caching Count: 0 Wasted Caching Attempts: 0 Functions Retrieved at Least Once: 221 Functions Cached and Never Retrieved: 6 Most Hits: ?GetNewInstanceIDforOldInstanceID@RemapPPtrTransfer@@QEAAHH@Z: 3065 ?MightContainPPtr@?$SerializeTraits@D@@SA_NXZ: 2537 ?Align@TransferBase@@QEAAXXZ: 1965 <...>
Out of total 20 second compile time, there’s a bunch of functions that take almost 1 second to compile each (after demangling:
RemapPPtrTransfer::TransferBase<class Shader>(class Shader&, enum TransferMetaFlags)) etc.
So hey, out of everything that Shader.cpp does, I know that in our codebase it’s the “data serialization” part that is slow
What would be super useful to have from compiler vendors: way more diagnostics on what & why something is slow. Detailed reports like this flag, and/or custom ETW providers for xperf, or anything like that. These would not be things used by most people, but when someone has to use them, they would be invaluable. Saving a minute or two of build time for several hundred programmers is an enormous amount of productivity gain!
Anyway, now you know. If you use Microsoft’s C++ compiler, try out
/d2cgsummary flag to
cl.exe. You might find something
interesting! Next time, how this flag pointed me towards a particular programming pattern that is slow to compile
on MSVC, and allowed saving 80 seconds of compile time of one file.
Update: An item on Visual Studio feedback site to document the flag and perhaps some others that might be useful.
Update 2: “Visual Studio 2017 Throughput Improvements and Advice“
post went up on Visual C++ team blog with
/d2cgsummary and other useful compiler throughput tips.