Ticket #415 (new enhancement)

Opened 2 years ago

Last modified 2 years ago

Nanosecond Capture Support

Reported by: petekalo Owned by: aturner
Priority: low Milestone: Future Release
Component: tcpreplay Version: 3.4.3
Keywords: Cc:
Operating System: Add to FAQ?: no
Hardware: All
Output of tcpreplay -V: tcpreplay version: 3.4.3 (build 2375) Copyright 2001-2009 by Aaron Turner <aturner at synfin dot net> Cache file supported: 04 Compiled against libdnet: 1.12 Compiled against libpcap: 1.0.0 64 bit packet counters: enabled Verbose printing via tcpdump: enabled Packet editing: disabled Fragroute engine: enabled Injection method: PF_PACKET send()

Description

I tried to do some research about what is out there, the information is a little hard to come by although it appears to me there is some type support of nanosecond files in libpcap,  http://www.tcpdump.org/pcap/pcap.html

I am by no means an expert in any of this, however I tested tcpreplay in this manner:

Loaded microsecond pcap into Wireshark, "Saved As" a nanosecond replay, I get this error:

Fatal Error: Error opening pcap file: bad dump file format.

Perhaps Wireshark is creating improperly formatted files, any insight into this would be appreciated.

Change History

comment:1 Changed 2 years ago by aturner

  • Priority changed from medium to low
  • Milestone changed from 4.0.0 to Future Release

nanosecond support isn't available in the current release of libpcap. I believe libpcap-ng which is far from complete supports it. The Wireshark nanosecond Save As feature does not generate a libpcap compatible pcap file. Hence this feature is non-trivial right now.

Also there really is little value for this feature since:

  1. Converting microsecond pcaps to nanosecond doesn't improve the resolution any since the nanosecond data isn't in the original pcap.
  2. In the real world, it is highly unlikely you'll find tcpreplay sends packets any more accurate with nanosecond information since the only timing mechanisim which is actually accurate to 1ns is OS X's AbsoluteTime() call. But the overhead of tcpreplay basically is going to reduce that significantly in terms of nanosecond resolution. Not to mention the AbsoluteTime() call takes more then 1ns to return!

comment:2 Changed 2 years ago by petekalo

Could not agree more regarding the issues of accuracy.

My concern is with speeds over ~6.8Gbps(and ~320Kpps) using microseconds, packets arriving sequentially within the same microsecond will have equal timestamps. Is there a way to send the packets with an equal microsecond timestamps to the NIC at the same time?

comment:3 Changed 2 years ago by aturner

Each packet has to be sent on at a time as there is no portable way to send multiple packets with a single write. Actually, I'm not aware of a way to do this on any specific platform for that matter either.

That said, there is no sleep function (regardless of the timing method) if the timestamps are the same between each packet. That isn't to say that the code couldn't be optimized a little more for this specific use case to send the packet a little quicker. Would optimizing this be sufficient?

Also, there is no sleep if the delta between two packets is less then or equal to the time sent since the last packet. Basically, if tcpreplay falls behind it tries to catch up.

On a side note, I'd be interested to hear what hardware/OS/etc you're using to get 320Kpps & ~6.8Gbps with tcpreplay. That's impressive!

comment:4 Changed 2 years ago by petekalo

I can't argue with optimization, although how much leaner can you really make your code.

I am at the point where I am trying to figure out the bottle necks, assuming unlimited ram. I guess companies like endace can essentially write the bits like a stream to the card, rather than treating each logical packet as an entity, I guess they have lower level interaction with the card.

I am pretty much just running latest gen intel hardware(xeon 5500) with 8xpci-e 2.0 10g nics. fedora 12x64.

With a packet size of 9000, I actually reach 8.8Gbps. Its almost linear- bitrates to packet size ratio. Of course I'm upping MTU.

comment:5 Changed 2 years ago by aturner

There's honestly not a ton of optimization to be done- any improvements are likely to be very slim. Honestly, once you tell tcpreplay to do any timing whatsoever, performance takes a pretty big hit. Using --topspeed is always significantly faster because it skips a significant bit of code.

Years ago I think Endace did a tcpreplay port to use their API. I believe their api allows multiple packets to be sent via a single write. One of these days if I ever got a Endace board I'd probably port their changes to the latest tcpreplay, but I don't have the $$$ to buy one.

I know the libpcap guys have been working on some zero-copy tricks. I think they support zero copy with BPF devices, but no idea about Linux's PF_PACKET. Not even sure if PF_PACKET supports zero-copy writes. If so, then that might be supportable either via libpca or via PF_PACKET directly. If you feel like doing some research on this that would help.

comment:6 Changed 2 years ago by aturner

Do you know if your kernel is configured with CONFIG_PACKET_MMAP_ZERO_COPY? Looks like recent linux kernels do support zero copy on PF_PACKET sockets with that option enabled. Seems to help in cases where the CPU is overloaded.

comment:7 Changed 2 years ago by petekalo

Sorry for the late response.

I am relatively new to anything kernel related, I was going through the kernel sources fedora supplies, but I could not find it. I saw a couple emails on kernel dev mailing lists about a zero copy patch that someone made, can you point me in the right direction?

I can do some testing and let you know how much of a difference it makes.

comment:8 Changed 2 years ago by aturner

So I did more research, and I don't think CONFIG_PACKET_MMAP_ZERO_COPY is the way to go. There is another option however, which your kernel probably already supports: PACKET_MMAP. I even found a simple test tool which uses this feature which should give you the ability to get an idea how much faster it might be. Since I currently have limited access to Linux boxes, perhaps you could give it a try and let me know what kind of performance you see?

 http://wiki.ipxwarzone.com/index.php5?title=Linux_packet_mmap

comment:10 Changed 2 years ago by petekalo

So I was able to poke around today.

side note: I disabled the ipv6 kernel module, along with net-pf-10. now I am getting ~500,000pps with avg packet size of 120 bytes, thats up from 300k.

My kernel does support packet_mmap. I am not really sure what that is telling me in terms of datarate. I have played with it with various values for packet size and the largest that works with my 9000 mtu size is the following:

root@fedora12 packet_mmap]# ./packet_mmap -s7608 -z1000000 eth2 CURRENT SETTINGS: str_devname: eth2 c_packet_sz: 7608 c_buffer_sz: 8192 c_buffer_nb: 1024 c_packet_sz count: 7608 c_packet_nb count: 1000 c_mtu: 0 c_send_mask: 127 c_sndbuf_sz: 1000000 mode_loss: 0 mode_thread: 0

STARTING TEST: send buff size = 1000000 data offset = 32 bytes start fill() thread send 128 packets (+973824 bytes) send 256 packets (+973824 bytes) send 384 packets (+973824 bytes) send 512 packets (+973824 bytes) send 640 packets (+973824 bytes) send 768 packets (+973824 bytes) send 896 packets (+973824 bytes) send 1000 packets (+791232 bytes) end of task fill() Loop until queue empty (0) END (number of error:0)

--

now if I use a packet size of 200, i get:

-- [root@fedora12 packet_mmap]# ./packet_mmap -s200 -z1000000 eth0 CURRENT SETTINGS: str_devname: eth2 c_packet_sz: 200 c_buffer_sz: 8192 c_buffer_nb: 1024 c_packet_sz count: 200 c_packet_nb count: 1000 c_mtu: 0 c_send_mask: 127 c_sndbuf_sz: 1000000 mode_loss: 0 mode_thread: 0

STARTING TEST: send buff size = 1000000 data offset = 32 bytes start fill() thread send 128 packets (+25600 bytes) send 256 packets (+25600 bytes) send 384 packets (+25600 bytes) send 512 packets (+25600 bytes) send 640 packets (+25600 bytes) send 768 packets (+25600 bytes) send 896 packets (+25600 bytes) send 1000 packets (+20800 bytes) end of task fill() Loop until queue empty (0) END (number of error:0)

comment:11 Changed 2 years ago by petekalo

I wanted to make sure my tcpreplay installation is making use of these kernel features, do I have to compile libpcap/tcpreplay or anything else with special options to achieve that? I want to make sure I am not leaving any steps out.

comment:12 Changed 2 years ago by aturner

Tcpreplay & libpcap do not support PACKET_MMAP. It will require development. Generally speaking, how large are the pcap files you are working with?

comment:13 Changed 2 years ago by petekalo

My goal is to make pcap files for as much ram as I have. So right now thats 24GB.

comment:14 Changed 2 years ago by aturner

Hmmm... I don't think PACKET_MMAP would work then... the amount of memory you can allocate is much smaller... like under 1GB from what I can tell.

Note: See TracTickets for help on using tickets.