Pesquisar neste blogue

domingo, 5 de julho de 2015

Profiling with gcc and gprof

One of my passions in computing is virtualization.
A friend of mine some decades back showed me a Super Nintendo emulator on his Pentium MMX and i was hooked.
More professional uses of virtualization came with time however the gaming bug still sticks with me.
This weekend i took some time to play around a Sega Dreamcast emulator - Lxdream -  developed by Nathan Keynes as last time i tried to compile it i had some warnings caused by some linux includes which some undefs solved.


#undef REG_RAX 
#undef REG_RCX
#undef REG_RDX
#undef REG_RBX 
#undef REG_RSP
#undef REG_RBP
#undef REG_RSI
#undef REG_RDI
#undef REG_R8
#undef REG_R9
#undef REG_R10
#undef REG_R11
#undef REG_R12
#undef REG_R13
#undef REG_R14
#undef REG_R15



After that was fixed i tried to profile the code to see what could be improved. Fortunately Nathan already had the --enable-profile configure switch which sets `-pg' option when you run the compiler.
After that  the gprof command generates a nice informative file.


gprof src/lxdream gmon.out > analysis.txt


Below the top lines of the output file


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 37.79      3.28     3.28                             cpu_print_registers
  9.10      4.07     0.79 123044895     0.00     0.00  arm_execute_instruction
  7.26      4.70     0.63                             ext_sdram_read_long
  6.68      5.28     0.58   109333     0.01     0.01  xlat_get_code
  2.77      5.52     0.24 18516563     0.00     0.00  sort_extract_triangles
  2.65      5.75     0.23                             ccn_prefetch
  2.19      5.94     0.19                             xlat_invalidate_long
  2.19      6.13     0.19  1531965     0.00     0.00  ta_commit_polygon
  2.07      6.31     0.18                             ext_sdram_write_word
  2.07      6.49     0.18                             ocram_page0_read_byte
  1.73      6.64     0.15                             xlat_flush_page
  1.50      6.77     0.13 150802470     0.00     0.00  arm_read_long
  1.50      6.90     0.13  6922797     0.00     0.00  ta_write_tile_entry
  1.50      7.03     0.13     1604     0.08     0.21  pvr2_scene_read
  1.38      7.15     0.12  4663896     0.00     0.00  sort_add_triangle
  1.27      7.26     0.11  3355889     0.00     0.00  pvr2_ta_process_block
  1.15      7.36     0.10    62003     0.00     0.00  audio_mix_samples
  1.04      7.45     0.09    62003     0.00     0.02  arm_run_slice
  1.04      7.54     0.09    21625     0.00     0.00  arm_restore_cpsr
  0.81      7.61     0.07                             ext_sdram_read_word
  0.81      7.68     0.07                             ext_sdram_write_long
  0.69      7.74     0.06  1238296     0.00     0.00  gl_render_tilelist
  0.63      7.80     0.06                             unmapped_prefetch



Maybe using OpenMP on that arm_execute_instruction is my next weekend computing fun.



Sem comentários:

Enviar um comentário