244 lines
		
	
	
		
			7.5 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			244 lines
		
	
	
		
			7.5 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| perf-trace(1)
 | |
| =============
 | |
| 
 | |
| NAME
 | |
| ----
 | |
| perf-trace - strace inspired tool
 | |
| 
 | |
| SYNOPSIS
 | |
| --------
 | |
| [verse]
 | |
| 'perf trace'
 | |
| 'perf trace record'
 | |
| 
 | |
| DESCRIPTION
 | |
| -----------
 | |
| This command will show the events associated with the target, initially
 | |
| syscalls, but other system events like pagefaults, task lifetime events,
 | |
| scheduling events, etc.
 | |
| 
 | |
| This is a live mode tool in addition to working with perf.data files like
 | |
| the other perf tools. Files can be generated using the 'perf record' command
 | |
| but the session needs to include the raw_syscalls events (-e 'raw_syscalls:*').
 | |
| Alternatively, 'perf trace record' can be used as a shortcut to
 | |
| automatically include the raw_syscalls events when writing events to a file.
 | |
| 
 | |
| The following options apply to perf trace; options to perf trace record are
 | |
| found in the perf record man page.
 | |
| 
 | |
| OPTIONS
 | |
| -------
 | |
| 
 | |
| -a::
 | |
| --all-cpus::
 | |
|         System-wide collection from all CPUs.
 | |
| 
 | |
| -e::
 | |
| --expr::
 | |
| --event::
 | |
| 	List of syscalls and other perf events (tracepoints, HW cache events,
 | |
| 	etc) to show. Globbing is supported, e.g.: "epoll_*", "*msg*", etc.
 | |
| 	See 'perf list' for a complete list of events.
 | |
| 	Prefixing with ! shows all syscalls but the ones specified.  You may
 | |
| 	need to escape it.
 | |
| 
 | |
| -D msecs::
 | |
| --delay msecs::
 | |
| After starting the program, wait msecs before measuring. This is useful to
 | |
| filter out the startup phase of the program, which is often very different.
 | |
| 
 | |
| -o::
 | |
| --output=::
 | |
| 	Output file name.
 | |
| 
 | |
| -p::
 | |
| --pid=::
 | |
| 	Record events on existing process ID (comma separated list).
 | |
| 
 | |
| -t::
 | |
| --tid=::
 | |
|         Record events on existing thread ID (comma separated list).
 | |
| 
 | |
| -u::
 | |
| --uid=::
 | |
|         Record events in threads owned by uid. Name or number.
 | |
| 
 | |
| -G::
 | |
| --cgroup::
 | |
| 	Record events in threads in a cgroup.
 | |
| 
 | |
| 	Look for cgroups to set at the /sys/fs/cgroup/perf_event directory, then
 | |
| 	remove the /sys/fs/cgroup/perf_event/ part and try:
 | |
| 
 | |
| 		perf trace -G A -e sched:*switch
 | |
| 
 | |
| 	Will set all raw_syscalls:sys_{enter,exit}, pgfault, vfs_getname, etc
 | |
| 	_and_ sched:sched_switch to the 'A' cgroup, while:
 | |
| 
 | |
| 		perf trace -e sched:*switch -G A
 | |
| 
 | |
| 	will only set the sched:sched_switch event to the 'A' cgroup, all the
 | |
| 	other events (raw_syscalls:sys_{enter,exit}, etc are left "without"
 | |
| 	a cgroup (on the root cgroup, sys wide, etc).
 | |
| 
 | |
| 	Multiple cgroups:
 | |
| 
 | |
| 		perf trace -G A -e sched:*switch -G B
 | |
| 
 | |
| 	the syscall ones go to the 'A' cgroup, the sched:sched_switch goes
 | |
| 	to the 'B' cgroup.
 | |
| 
 | |
| --filter-pids=::
 | |
| 	Filter out events for these pids and for 'trace' itself (comma separated list).
 | |
| 
 | |
| -v::
 | |
| --verbose=::
 | |
|         Verbosity level.
 | |
| 
 | |
| --no-inherit::
 | |
| 	Child tasks do not inherit counters.
 | |
| 
 | |
| -m::
 | |
| --mmap-pages=::
 | |
| 	Number of mmap data pages (must be a power of two) or size
 | |
| 	specification with appended unit character - B/K/M/G. The
 | |
| 	size is rounded up to have nearest pages power of two value.
 | |
| 
 | |
| -C::
 | |
| --cpu::
 | |
| Collect samples only on the list of CPUs provided. Multiple CPUs can be provided as a
 | |
| comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2.
 | |
| In per-thread mode with inheritance mode on (default), Events are captured only when
 | |
| the thread executes on the designated CPUs. Default is to monitor all CPUs.
 | |
| 
 | |
| --duration::
 | |
| 	Show only events that had a duration greater than N.M ms.
 | |
| 
 | |
| --sched::
 | |
| 	Accrue thread runtime and provide a summary at the end of the session.
 | |
| 
 | |
| --failure::
 | |
| 	Show only syscalls that failed, i.e. that returned < 0.
 | |
| 
 | |
| -i::
 | |
| --input::
 | |
| 	Process events from a given perf data file.
 | |
| 
 | |
| -T::
 | |
| --time::
 | |
| 	Print full timestamp rather time relative to first sample.
 | |
| 
 | |
| --comm::
 | |
|         Show process COMM right beside its ID, on by default, disable with --no-comm.
 | |
| 
 | |
| -s::
 | |
| --summary::
 | |
| 	Show only a summary of syscalls by thread with min, max, and average times
 | |
|     (in msec) and relative stddev.
 | |
| 
 | |
| -S::
 | |
| --with-summary::
 | |
| 	Show all syscalls followed by a summary by thread with min, max, and
 | |
|     average times (in msec) and relative stddev.
 | |
| 
 | |
| --tool_stats::
 | |
| 	Show tool stats such as number of times fd->pathname was discovered thru
 | |
| 	hooking the open syscall return + vfs_getname or via reading /proc/pid/fd, etc.
 | |
| 
 | |
| -f::
 | |
| --force::
 | |
| 	Don't complain, do it.
 | |
| 
 | |
| -F=[all|min|maj]::
 | |
| --pf=[all|min|maj]::
 | |
| 	Trace pagefaults. Optionally, you can specify whether you want minor,
 | |
| 	major or all pagefaults. Default value is maj.
 | |
| 
 | |
| --syscalls::
 | |
| 	Trace system calls. This options is enabled by default, disable with
 | |
| 	--no-syscalls.
 | |
| 
 | |
| --call-graph [mode,type,min[,limit],order[,key][,branch]]::
 | |
|         Setup and enable call-graph (stack chain/backtrace) recording.
 | |
|         See `--call-graph` section in perf-record and perf-report
 | |
|         man pages for details. The ones that are most useful in 'perf trace'
 | |
|         are 'dwarf' and 'lbr', where available, try: 'perf trace --call-graph dwarf'.
 | |
| 
 | |
|         Using this will, for the root user, bump the value of --mmap-pages to 4
 | |
|         times the maximum for non-root users, based on the kernel.perf_event_mlock_kb
 | |
|         sysctl. This is done only if the user doesn't specify a --mmap-pages value.
 | |
| 
 | |
| --kernel-syscall-graph::
 | |
| 	 Show the kernel callchains on the syscall exit path.
 | |
| 
 | |
| --max-stack::
 | |
|         Set the stack depth limit when parsing the callchain, anything
 | |
|         beyond the specified depth will be ignored. Note that at this point
 | |
|         this is just about the presentation part, i.e. the kernel is still
 | |
|         not limiting, the overhead of callchains needs to be set via the
 | |
|         knobs in --call-graph dwarf.
 | |
| 
 | |
|         Implies '--call-graph dwarf' when --call-graph not present on the
 | |
|         command line, on systems where DWARF unwinding was built in.
 | |
| 
 | |
|         Default: /proc/sys/kernel/perf_event_max_stack when present for
 | |
|                  live sessions (without --input/-i), 127 otherwise.
 | |
| 
 | |
| --min-stack::
 | |
|         Set the stack depth limit when parsing the callchain, anything
 | |
|         below the specified depth will be ignored. Disabled by default.
 | |
| 
 | |
|         Implies '--call-graph dwarf' when --call-graph not present on the
 | |
|         command line, on systems where DWARF unwinding was built in.
 | |
| 
 | |
| --print-sample::
 | |
| 	Print the PERF_RECORD_SAMPLE PERF_SAMPLE_ info for the
 | |
| 	raw_syscalls:sys_{enter,exit} tracepoints, for debugging.
 | |
| 
 | |
| --proc-map-timeout::
 | |
| 	When processing pre-existing threads /proc/XXX/mmap, it may take a long time,
 | |
| 	because the file may be huge. A time out is needed in such cases.
 | |
| 	This option sets the time out limit. The default value is 500 ms.
 | |
| 
 | |
| PAGEFAULTS
 | |
| ----------
 | |
| 
 | |
| When tracing pagefaults, the format of the trace is as follows:
 | |
| 
 | |
| <min|maj>fault [<ip.symbol>+<ip.offset>] => <addr.dso@addr.offset> (<map type><addr level>).
 | |
| 
 | |
| - min/maj indicates whether fault event is minor or major;
 | |
| - ip.symbol shows symbol for instruction pointer (the code that generated the
 | |
|   fault); if no debug symbols available, perf trace will print raw IP;
 | |
| - addr.dso shows DSO for the faulted address;
 | |
| - map type is either 'd' for non-executable maps or 'x' for executable maps;
 | |
| - addr level is either 'k' for kernel dso or '.' for user dso.
 | |
| 
 | |
| For symbols resolution you may need to install debugging symbols.
 | |
| 
 | |
| Please be aware that duration is currently always 0 and doesn't reflect actual
 | |
| time it took for fault to be handled!
 | |
| 
 | |
| When --verbose specified, perf trace tries to print all available information
 | |
| for both IP and fault address in the form of dso@symbol+offset.
 | |
| 
 | |
| EXAMPLES
 | |
| --------
 | |
| 
 | |
| Trace only major pagefaults:
 | |
| 
 | |
|  $ perf trace --no-syscalls -F
 | |
| 
 | |
| Trace syscalls, major and minor pagefaults:
 | |
| 
 | |
|  $ perf trace -F all
 | |
| 
 | |
|   1416.547 ( 0.000 ms): python/20235 majfault [CRYPTO_push_info_+0x0] => /lib/x86_64-linux-gnu/libcrypto.so.1.0.0@0x61be0 (x.)
 | |
| 
 | |
|   As you can see, there was major pagefault in python process, from
 | |
|   CRYPTO_push_info_ routine which faulted somewhere in libcrypto.so.
 | |
| 
 | |
| SEE ALSO
 | |
| --------
 | |
| linkperf:perf-record[1], linkperf:perf-script[1]
 | 
