Hat AMD ein Gegenstück zu I/OAT, Direct Cache Access & IXGBE

mocad_tom · 22.11.2010

Hat Intel es geschafft durch eine Kombination aus CPU + Chipset + Netwerkkarte einen neuen Vendor Lock-in zu schaffen?
http://www.scribd.com/doc/40688981/performance-tweaks-and-tools-for-linux
Ab Seite 87.
TCP Segmentation Offloading
I/OAT DMA Engine
Direct Cache Access in Verbindung mit IXGBE-Netzwerkkarten.
Alles bereits in Linux funktionsfähig.

Grüße,
Tom

mocad_tom · 29.11.2010

Mehr Diskussionsstoff:
Dieses Paper habe ich ja schon mal gepostet:
http://pdos.csail.mit.edu/papers/linux:osdi10.pdf
Zitat von Seite 6:

....Good performance with many cores and many independent
network connections demands that each packet, queue,
and connection be handled by just one core [21, 42]. This
avoids inter-core cache misses and queue locking costs.
Recent Linux kernels take advantage of network cards
with multiple hardware queues, such as Intel’s 82599
10Gbit Ethernet (IXGBE) card, or use software techniques,
such as Receive Packet Steering [26] and Receive
Flow Steering [25], to attempt to achieve this property.
With a multi-queue card, Linux can be configured to assign
each hardware queue to a different core. Transmit
scaling is then easy: Linux simply places outgoing packets
on the hardware queue associated with the current
core. For incoming packets, such network cards provide
an interface to configure the hardware to enqueue incoming
packets matching a particular criteria (e.g., source IP
address and port number) on a specific queue and thus
to a particular core.......

Hier noch die dazugehörigen Quellenangaben:

[21] M. Dobrescu, N. Egi, K. Argyraki, B.-G. Chun,
K. Fall, G. Iannaccone, A. Knies, M. Manesh, and
S. Ratnasamy. RouteBricks: Exploiting parallelism
to scale software routers. In Proc of the 22nd SOSP,
Big Sky, MT, USA, Oct 2009.

[25] T. Herbert. rfs: receive flow steering, September
2010. http://lwn.net/Articles/381955/

[26] T. Herbert. rps: receive packet steering, September
2010. http://lwn.net/Articles/361440/

Dann zu "receive packet steering" [26] (da scheint wohl jemand von google dran zu) :

Problem statement: Protocol processing done in the NAPI context for received
packets is serialized per device queue and becomes a bottleneck under high
packet load. This substantially limits pps that can be achieved on a single
queue NIC and provides no scaling with multiple cores.

This solution queues packets early on in the receive path on the backlog queues
of other CPUs. This allows protocol processing (e.g. IP and TCP) to be
performed on packets in parallel. For each device (or NAPI instance for
a multi-queue device) a mask of CPUs is set to indicate the CPUs that can
process packets for the device. A CPU is selected on a per packet basis by
hashing contents of the packet header (the TCP or UDP 4-tuple) and using the
result to index into the CPU mask.

Dieses Dobrescu-Paper finde ich schon auch sehr interessant:
http://routebricks.org/papers/rb-sosp09.pdf
Hier wird auch darauf eingegangen, dass der alte FSB (vor Nehalem) zu einem Bottleneck werden kann. Und mit Nehalem eine deutliche Verbesserung erkennbar war.

Beim "receive packet steering"-Artikel wird von NAPI gesprochen, bei einer weiteren Recherche habe ich ein Projekt zu TNAPI (Threaded NAPI) gefunden.
http://www.ntop.org/TNAPI.html
Die beiden Abbildungen auf dieser Seite zeigen, wie man sich Multi-Queue vor zu stellen hat. Das ideale Packet besteht also aus einer
+ Multi-Queue-fähigen Netzwerkkarte (Intel 82598 / 82599)
+ Mainboard mit I/O AT (I/O Acceleration Technology), DCA (Direct Cache Architecture)
+ mehreren Multicore-CPUs

Damit wird klar, wie Intel es einmal mehr geschafft hat bei Amazons und Microsofts Cloud-Computing-Angeboten eingesetzt zu werden.
Durch diese direkte Verzahnung von Hardware-Queue mit Betriebssystem, oder virtualisierter Umgebung (VMware nutzt auch diese Multi-Queues), nutzt man die (Multicore-)CPU deutlich stärker, da sie weniger idelt und mehr arbeit an die Netzwerkkarte ausgelagert wurde.

Grüße,
Tom

Lynxeye · 29.11.2010

Ich sehe noch nicht ganz, wo genau das nur mit Intel zu tun haben soll.

Intel zeigt das ganze jetzt mit Ethernetkarten, aber so viel wirklich innovatives ist da nicht dabei.

TCP Segmentation Offloading kann fast jede halbwegs vernünftige Ethernetkarte heutzutage. Direct DMA und mehrere Hardwareqeues gab es auch schon vor Jahren, zum Beispiel bei Infiniband Netzwerkkarten. Da sehe ich jetzt keinen Vendor Lock-in. Das ganze ist genauso mit Linux unter AMD Plattformen nutzbar.

Meckel · 29.11.2010

mocad_tom schrieb:
receive packet steering

Ist seit 2.6.34-36 upstream.

BTW: Gibts irgentwo nen Paper zu HW Multiqueue ?

Lynxeye · 29.11.2010

Meckel schrieb:
Ist seit 2.6.34-36 upstream.

Was ist denn bitte 34-36? Upstream ist das Ganze seit 2.6.35.

Bobo_Oberon · 29.11.2010

Ohne im Detail nachgelesen zu haben ... Sun hat mit dem Niagara 2 auch 10-Gigabit-Ports für Ethernet und direkter Off-Loading Technik für bessere Protokollverarbeitung integriert - ach ja, Krypto-Beschleunigung ist auch dabei.

IBM kennt ähnliches von daher scheints eher ein Nachziehen von Intel und AMD zu sein ... aber das ist ja nichts neues.

MFG Bobo(2010)

mocad_tom · 16.12.2010

Linux Networking: The RISE of the congestion window, the FALL of the routing cache, and the LOCALITY of packets.
David S. Miller
Red Hat Inc.
IBM Watson Research Center, 2010

http://vger.kernel.org/~davem/davem_ibm2010.pdf

Ein Vortrag ebenfalls über Multi-Queue und RPS.
Recht aktuell, muss anfang Dezember gewesen sein.

In den Linux Kernel 2.6.37 muss wohl auch einiges einfließen:
http://www.heise.de/open/artikel/Ke...-3-Netzwerk-und-Storage-Hardware-1152083.html

Einige weitere wichtige Änderungen nennt Netzwerk-Subsystem-Betreuer David Miller in seinem Haupt-Git-Pull-Request. Dort lobt er zahlreiche von Eric Dumazet eingebrachte und teilweise bei den kleinen Perlen am Ende des Artikels verlinkte Optimierungen rund um Routing-, Neighbour- und Device Handling. Laut Tests sei das Routing dadurch nun schneller, wenn der Routing Cache deaktiviert ist; letzterer werde aber noch für andere Dinge gebraucht, daher können man ihn nicht einfach entfernen. Einige Hintergründe zu den Optimierungen finden sich auch in den Präsentationsfolien des kürzlich von Miller gehaltenen Vortrags "Linux Networking: The RISE of the congestion window, the FALL of the routing cache, and the LOCALITY of packets".

Allfred · 16.12.2010

mocad_tom schrieb:
In den Linux Kernel 2.6.37 muss wohl auch einiges einfließen:
http://www.heise.de/open/artikel/Ke...-3-Netzwerk-und-Storage-Hardware-1152083.html

Danke mocad_tom für die Hintergrundinfos zu dem Thema. Hast Du auch das gelesen: "eine mögliche Hintertür in der Implementierung des IPSec-Stacks zum Aufbau von VPNs in OpenBSD [...] Daher wisse er, dass das FBI seinerzeit erfolgeich mehrere Hintertürchen und Möglichkeiten für Seitenkanal-Angriffe im OpenBSD Crypto Framework (OCF) platziert habe.[...] Linux enthält die eigene Netkey-Implementierung im Kernel, unterstützt aber auch andere Lösungen." heise

Hat AMD ein Gegenstück zu I/OAT, Direct Cache Access & IXGBE

mocad_tom

Admiral Special

mocad_tom

Admiral Special

Lynxeye

Admiral Special

Meckel

Commodore Special

Lynxeye

Admiral Special

Bobo_Oberon

Grand Admiral Special

mocad_tom

Admiral Special

Allfred

Grand Admiral Special

Ähnliche Themen