Optimized C++:Proven Techniques for Heightened Performance
Structure Member Alignment, Padding and Data Packing
1.通用优化规则
a.静态分析工具:clang
wiki:http://clang.llvm.org/extra/index.html
check list:https://clang.llvm.org/extra/clang-tidy/checks/list.html
性能优化相关:
https://clang.llvm.org/extra/clang-tidy/checks/performance-faster-string-find.html
https://clang.llvm.org/extra/clang-tidy/checks/performance-inefficient-string-concatenation.html
https://clang.llvm.org/extra/clang-tidy/checks/performance-inefficient-vector-operation.html
https://clang.llvm.org/extra/clang-tidy/checks/performance-unnecessary-copy-initialization.html
https://clang.llvm.org/extra/clang-tidy/checks/performance-unnecessary-value-param.html
b.grep
1)字符串:拼接 a = a + b => a +=b , find("c") => find('c') 2)分支预测:PANGU_LIKELY PANGU_UNLIKELY 3)vector reserve & proto repeated Reserve 4)loop内条件避免函数调用:for (size_t i = 0; i < size(); ++i) 5)it++ => ++it 6)短函数 => inline #原则经常访问的函数进行inline,不经常访问的代码做成函数,减少cache miss 7)stl map set:[]与find冗余,重复查找 8)内存对齐 struct定义变量顺序,按照大小顺序进行定义 9)乘除浮点运算 => + - 位运算 大规则 1)减少内存分配和copy: 参数传引用,避免大对象的copy,常用的临时性对象不用new用内存池,避免在不同线程分配释放内存 2)优化算法&数据结构:map->unordered_map,去stl容器 3)锁的scope要尽量小,无锁,TLS存储,协程 4)使用更高效的lib库实现:tcmalloc
2.热点分析
a.自己加counter打日志
b.gperftools
c.perf:text模式可以直观看到火焰图中最顶层开销
调用关系图:text call-graph flume-graph
3.性能比对
a.编写测试代码
b.编译后,diff汇编代码
b.运行时diff:time cost/diff perf stat/perf record->perf diff