Clean SOAP output with tcpdump

I often have to resort to tcpdump to debug the input and output to production services consuming SOAP messages. The problem with tcpdump is that the ASCII output is littered with binary garbage at the start, and this makes it a rather laborious thing to clean up. The following article is a description of some scripts, along with the source code, that cleans up the output.

Dumping to stdout:
tcpdump -A -s 0 -w -
"dst $DST and port $PORT > dump.dat"

Reading from the dump file using tcpdump -s0 -A -r - < dump.dat results in output like this

15:01:32.748754 IP 10.246.126.172.19992 > nw-ws-035.asplogon.com.http: Flags [.], seq 1831:3291, ack 9496, win 253, length 1460
.@.u...
.~.>F(.N..P.({..&..P.......<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><SOAP-ENV:Header><wsse:Security xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd" xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"><wsse:UsernameToken SOAP-ENV:mustUnderstand="1"><wsse:Username><!-- more xml -->

15:01:32.751756 IP 10.246.126.172.19992 > nw-ws-035.asplogon.com.http: Flags [P.], seq 3291:3383, ack 9496, win 253, length 92
.@.u...
.~.>F(.N..P.(.o.&..P.......<!-- the rest of the xml --></SOAP-ENV:Body></SOAP-ENV:Envelope>

I just want the text, and so I filter it using the following Python snippet:

#!/usr/bin/python
# -*- coding: UTF-8 -*-
# @author: Carl-Erik Kopseng
import sys
import re

length_of_tcp_packet = -1
combined_data = ""
for line in sys.stdin:
  pattern = 'dd:dd:dd.*, length (d+)'
  match = re.match(pattern, line)
	
  if match is None :
    if length_of_tcp_packet <= 0:
      continue
    combined_data += "" + line
  else:
    sys.stdout.write( combined_data[-(length_of_tcp_packet+1):-1] )

    length_of_tcp_packet = int(match.group(1))
    combined_data = ""

sys.stdout.write( combined_data ) 

If the above commands were saved in «read_dump_file.sh» and «print_clean_data_packets_in_ASCII.py» you could then filter the dump file by running
./read_dump_file.sh < dump.dat | ./print_clean_data_packets_in_ASCII.py

and get the following clean output

<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><SOAP-ENV:Header><wsse:Security xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd" xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"><wsse:UsernameToken SOAP-ENV:mustUnderstand="1"><wsse:Username><!-- more xml --><!-- the rest of the xml --></SOAP-ENV:Body></SOAP-ENV:Envelope>

Usually there is a lot more than just SOAP calls going in and out, and to filter out just the SOAP calls I usually also pipe the output through the following sed script for the final touch🙂
sed -n '/<[a-zA-Z:-]*Envelope/,/</[a-zA-Z].*:Envelope>/ p' $@

Hope this is of help to someone out there🙂

Legg igjen en kommentar

Fyll inn i feltene under, eller klikk på et ikon for å logge inn:

WordPress.com-logo

Du kommenterer med bruk av din WordPress.com konto. Logg ut / Endre )

Twitter picture

Du kommenterer med bruk av din Twitter konto. Logg ut / Endre )

Facebookbilde

Du kommenterer med bruk av din Facebook konto. Logg ut / Endre )

Google+ photo

Du kommenterer med bruk av din Google+ konto. Logg ut / Endre )

Kobler til %s

%d bloggers like this: