06/21/09 | Learn Ethical Hacking Tools Free Hacking Tricks How To Hack Hacking Passwords & Email Hacking

Wednesday, June 24, 2009

A “Multivendor Post” to help our mutual iSCSI customers using VMware

SOURCE: Click Here

Posted by: Dan Israel

Today’s post is one you don’t often find in the blogosphere, see today’s post is a collaborative effort initiated by me, Chad Sakac (EMC), which includes contributions from Andy Banta (VMware), Vaughn Stewart (NetApp), Eric Schott (Dell/EqualLogic), and Adam Carter (HP/Lefthand), David Black (EMC) and various other folks at each of the companies.

Together, our companies make up the large majority of the iSCSI market, all make great iSCSI targets, and we (as individuals and companies) all want our customers to have iSCSI success.

I have to say, I see this one often - customer struggling to get high throughput out of iSCSI targets on ESX. Sometimes they are OK with that, but often I hear this comment: "…My internal SAS controller can drive 4-5x the throughput of an iSCSI LUN…"

Can you get high throughput with iSCSI with GbE on ESX? The answer is YES. But there are some complications, and some configuration steps that are not immediately apparent. You need to understanding some iSCSI fundamentals, some Link Aggregation fundamentals, and know some ESX internals – none of which are immediately obvious…

If you’re interested (and who wouldn’t be interested with a great topic and a bizzaro-world “multi-vendor collaboration”... I can feel the space-time continuum collapsing around me :-), read on...

We could start this conversation by playing a trump card; 10GbE, but we’ll save this topic for another discussion. Today 10GbE is relatively expensive per port and relatively rare, and the vast majority of iSCSI and NFS deployments are on GbE. 10GbE is supported by VMware today (see the VMware HCL here), and all of the vendors here either have, or have announced 10GbE support.

10GbE can support the ideal number of cables from an ESX host – two. This reduction in port count can simplify configurations, reduce the need for link aggregation, provide ample bandwidth, and even unify FC using FCoE on the same fabric for customers with existing FC investments. We all expect to see rapid adoption of 10GbE as prices continue to drop. Chad has blogged on 10GbE and VMware here.

This post is about trying to help people maximize iSCSI on GbE, so we’ll leave 10GbE for a followup.

If you are serious about iSCSI in your production environment, it’s valuable to do a bit of learning, and it’s important to do a little engineering during design. iSCSI is easy to connect and begin using, but like many technologies which excel in terms of their simplicity the default options and parameters may not be robust enough to provide an iSCSI infrastructure which can support your business.

With that in mind, this post is going to start with sections called “Understanding” which will walk through protocol details and ESX Software Initiator internals. You can skip them if you want to jump to configuration options, but a bit of learning goes a long way into understanding the WHY of the HOWs (which I personally always think makes them easier to remember).

Understanding your Ethernet Infrastructure

Do you have a “bet the business” Ethernet infrastructure? Don’t think of iSCSI (or NFS datastores) use here as “it’s just on my LAN”, but “this is the storage infrastructure that is supporting my entire critical VMware infrastructure”. IP storage needs the same sort of design thinking applied to FC infrastructure. Here are some things to think about:

Are you separating you storage and network traffic on different ports? Could you use VLANs for this? Sure. But is that “bet the business” thinking? Do you want a temporarily busy LAN to swamp your storage (and vice-versa) for the sake of a few NICs and switch ports? If you’re using 10GbE, sure – but GbE?

Think about Flow-Control (should be set to receive on switches and transmit on iSCSI targets)

Enable spanning tree protocol with either RSTP or portfast enabled

Filter / restrict bridge protocol data units on storage network ports

If you want to squeeze out the last bit, configure jumbo frames (always end-to-end – otherwise you will get fragmented gobbledygook)

Use Cat6 cables rather than Cat5/5e. Yes, Cat5e can work – but remember – this is “bet the business”, right? Are you sure you don’t want to buy that $10 cable?

You’ll see later that things like cross-stack Etherchannel trunking can be handy in some configurations.

Each Ethernet switch also varies in its internal architecture – for mission-critical, network intensive Ethernet purposes (like VMware datastores on iSCSI or NFS), amount of port buffers, and other internals matter – it’s a good idea to know what you are using.

If performance is important, have you thought about how many workloads (guests) are you running? Both individually and in aggregate are they typically random, or streaming? Random I/O workloads put very little throughput stress on the SAN network. Conversely, sequential, large block I/O workloads place a heavier load.

In the same vein, be careful running single stream I/O tests if your environment is multi-stream / multi-server. These types of tests are so abstract they provide zero data relative to the shared infrastructure that you are building.

In general, don’t view “a single big LUN” as a good test – all arrays have internal threads handling I/Os, and so does the ESX host itself (for VMFS and for NFS datastores). In general, in aggregate, more threads are better than fewer. You increase threading on the host with more operations against that single LUN (or file system), and every vendor’s internals are slightly different, but in general, more internal array objects are better than fewer – as there are more threads.

Not an “Ethernet” thing, but while we’re talking on the subject of performance generally and not skimping, there’s no magic on the brown spinny things – you need enough array spindles to support the IO workload – often not enough drives in total, or an under-configured specific sub/group of drives – every vendor does this differently (aggregates/RAID groups/pools), but all have some sort of “disk grouping” out of which LUNs (and file systems in some cases) get their collective IOPs.

Understanding: iSCSI Fundamentals

We need to begin with a prerequisite nomenclature to establish a start point. If you really want the “secret decoder ring” then start here: http://tools.ietf.org/html/rfc3720

This diagram is chicken scratch, but it gets the point across. The red numbers are explained below.

iSCSI initiator = an iSCSI client, and serves the same purpose as an HBA, sending SCSI commands, and encapsulating in IP packets. This can operate in the hypervisor (example in this case this would be the ESX software initiator or hardware initiator) and/or in the guests (example – the Microsoft iSCSI initiator).

iSCSI target = an iSCSI server, usually on an array of some type. Arrays vary in how they implement this. Some have one (the array itself), some have many, some map them to physical interfaces, some make each LUN an iSCSI target.

iSCSI initiator port = the end-point of an iSCSI session, and is not a TCP port. After all the handshaking, the iSCSI initiator device creates and maintains a list of iSCSI initiator ports. Think of the iSCSI initiator port as the “on ramp” for data.

iSCSI network portal = an IP address or grouping of IP addresses used by iSCSI initiator or target (in which case it’s IP address and TCP port). There can be groupings of network portals into.. portal groups (see Multiple Connections per Session)

iSCSI Connection = a TCP Connection, and carries control info, SCSI commands and data being read or written.

iSCSI Session = one or more TCP connections that form an initiator-target session

Multiple Connections per Session (MC/S) = iSCSI can have multiple connections within a single session (see above).

MPIO = Multipathing, and used very generally as a term – but exists ABOVE the whole iSCSI layer (which in turn is on top of the network layer) in the hypervisor and/or in the guests. As an example, when you configure the ESX storage multipathing, that’s MPIO. MPIO is defacto load-balancing and availability model for iSCSI

Understanding: Link Aggregation Fundamentals

The next thing as a core bit of technology to understand is Link Aggregation. The group spent a fair amount of time going around on this as we were writing this post. Many people jump to this as a way as and “obvious” mechanism to provide greater aggregate bandwidth than a single GbE link can provide.

The core thing to understand (and the bulk of our conversation – thank you Eric and David) is that 802.3ad/LACP surely aggregates physical links, but the mechanisms used to determine the whether a given flow of information follows one link or another are critical.

Personally, I found this doc very clarifying.: http://www.ieee802.org/3/hssg/public/apr07/frazier_01_0407.pdf

You’ll note several key things in this doc:

All frames associated with a given “conversation” are transmitted on the same link to prevent mis-ordering of frames. So what is a “conversation”? A “conversation” is the TCP connection.

The link selection for a conversation is usually done by doing a hash on the MAC addresses or IP address.

There is a mechanism to “move a conversation” from one link to another (for loadbalancing), but the conversation stops on the first link before moving to the second.

Link Aggregation achieves high utilization across multiple links when carrying multiple conversations, and is less efficient with a small number of conversations (and has no improved bandwith with just one). While Link Aggregation is good, it’s not as efficient as a single faster link.

It’s notable that Link Aggregation and MPIO are very different. Link Aggregation applies between two network devices only. Link aggregation can load balance efficiently – but is not particularly efficient or predictable when there are a low number of TCP connections.

Conversely MPIO applies on an end-to-end iSCSI session – a full path from the initiator to the target. It can be efficient in loadbalancing with a low number of TCP sessions. While Link Aggregation can be applied to iSCSI (as will be discussed below), MPIO is generally the design point for iSCSI multipathing.

Understanding: iSCSI implementation in ESX 3.x

The key to understanding the issue is that the ESX 3.x software initiator only supports a single iSCSI session with a single TCP connection for each iSCSI target.

Making this visual… in the diagram above, while in iSCSI generally you can have multiple “purple pipes” each with one or more “orange pipes” to any iSCSI target, and use MPIO with multiple active paths to drive I/O down both paths.

You can also have multiple “orange pipes” (the iSCSI connections) in each “purple pipe” (single iSCSI session) - Multiple Connections per Session (which effectively multipaths below the MPIO stack), shown in the diagram below.

But in the ESX software iSCSI intiator case, you can only have one orange “pipe” for each purple pipe for every target (green boxes marked 2), and only one “purple pipe” for every iSCSI target. The end of the “purple pipe” is the iSCSI intiator port – and these are the “on ramps” for MPIO

So, no matter what MPIO setup you have in ESX, it doesn't matter how many paths show up in the storage multipathing GUI for multipathing to a single iSCSI Target, because there’s only one iSCSI initiator port, only one TCP port per iSCSI target. The alternate path to the gets established after the primary active path is unreachable. This is shown in the diagram below.

VMware can’t be accused of being unclear about this. Directly in the iSCSI SAN Configuration Guide: “ESX Server‐based iSCSI initiators establish only one connection to each target. This means storage systems with a single target containing multiple LUNs have all LUN traffic on that one connection”, but in general, in my experience, this is relatively unknown.

This usually means that customers find that for a single iSCSI target (and however many LUNs that may be behind that target – 1 or more), they can’t drive more than MBps.

This shouldn’t make anyone conclude that iSCSI is not a good choice or that 160MBps is a show-stopper. For perspective I was with a VERY big customer recently (more than 4000 VMs on Thursday and Friday two weeks ago) and their comment was that for their case (admittedly light I/O use from each VM) this was working well. Requirements differ for every customer.

Now, this behavior will be changing in the next major VMware release. Among other improvements, the iSCSI initiator will be able to use multiple iSCSI sessions (hence multiple TCP connections). Looking at our diagram, this corresponds with “multiple purple pipes”for a single target. It won’t support MC/S or “multiple orange pipes per each purple pipe” – but in general this is not a big deal (large scale use of MC/S has shown a marginal higher efficiency than MPIO at very high end 10GbE configurations) .

Multiple iSCSI sessions will mean multiple “on-ramps” for MPIO (and multiple “conversations” for Link Aggregation). The next version also brings core multipathing improvements in the vStorage initiative (improving all block storage): NMP round robin, ALUA support, and EMC PowerPath for VMware which integrates into the MPIO framework and further improves multipathing. In the spirit of this post, EMC is working to make PowerPath for VMware as heterogeneous as we can.

Together – multiple iSCSI sessions per iSCSI target and improved multipathing means aggregate throughput for a single iSCSI target above that 160MBps mark in the next VMware release, as people are playing with now. Obviously we’ll do a follow up post.

(Strongly) Recommended Additional Reading

I would STRONGLY recommend reading a series of posts that the inimitable Scott Lowe has done on ESX networking, and start at his recap here:

http://blog.scottlowe.org/2008/12/19/vmware-esx-networking-articles/

Also – I would strongly recommend reading the vendor documentation on this carefully.

START HERE - VMware: iSCSI SAN Configuration Guide
EMC Celerra: VMware ESX Server Using EMC Celerra Storage Systems – Solutions Guide
EMC CLARiiON: VMware ESX Server Using EMC CLARiiON Storage Systems - Solutions Guide
EMC DMX: VMware ESX Server Using EMC Symmetrix Storage Systems – Solutions Guide
NetApp: NetApp & VMware Virtual Infrastructure 3 : Storage Best Practices (Vaughn is proud to say this is the most popular NetApp TR)
HP/LeftHand: LeftHand Networks VI3 field guide for SAN/iQ 8 SANs
Dell/EqualLogic:

> Network Performance Guidelines

> VMware Virtual Infrastructure 3.x Considerations, Configuration and Operation Using an Equallogic PS Series SAN

ENOUGH WITH THE LEARNING!!! HOW do you get high iSCSI throughput in ESX 3.x?

As discussed earlier, the ESX 3.x software initiator really only works on a single TCP connection for each target – so all traffic to a single iSCSI Target will use a single logical interface. Without extra design measures, it does limit the amount of IO available to each iSCSI target to roughly 120 – 160 MBs of read and write access.

This design does not limit the total amount of I/O bandwidth available to an ESX host configured with multiple GbE links for iSCSI traffic (or more generally VMKernel traffic) connecting to multiple datastores across multiple iSCSI targets, but does for a single iSCSI target without taking extra steps.

Here are the questions that customers usually ask themselves:

Question 1: How do I configure MPIO (in this case, VMware NMP) and my iSCSI targets and LUNs to get the most optimal use of my network infrastructure? How do I scale that up?

Question 2: If I have a single LUN that needs really high bandwidth – more than 160MBps and I can’t wait for the next major ESX version, how do I do that?

Question 3: Do I use the Software Initiator or the Hardware Initiator?

Question 4: Do I use Link Aggregation and if so, how?

Here are the answers you seek…

Question 1: How do I configure MPIO (in this case, VMware NMP) and my iSCSI targets and LUNs to get the most optimal use of my network infrastructure? How do I scale that up?

Answer 1: Keep it simple. Use the ESX iSCSI software initiator. Use multiple iSCSI targets. Use MPIO at the ESX layer. Add Ethernet links and iSCSI targets to increase overall throughput. Ser your expectation for no more than ~160MBps for a single iSCSI target.

Remember an iSCSI session is from initiator to target. If use multiple iSCSI targets, with multiple IP addresses, you will use all the available links in aggregate, the storage traffic in total will load balance relatively well. But any individual one target will be limited to a maximum of single GbE connection's worth of bandwidth.

Remember that this also applies to all the LUNs behind that target. So, consider that as you distribute the LUNs appropriately among those targets.

The ESX initiator uses the same core method to get a list of targets from any iSCSI array (static configuration or dynamic discovery using the iSCSI SendTargets request) and then a list of LUNs behind that target (SCSI REPORT LUNS command).

So, to place your LUNs appropriately to balance the workload:

On an EMC CLARiiON, each physical interface is seen by an ESX host as a separate target, so balance the LUNs behind your multiple iSCSI targets (physical ports).
On a Dell/EqualLogic array, since every LUN is a target, balancing is automatic and you don’t have to do this.
On an HP/LeftHand array, since every LUN is a target, balancing is automatic and you don’t have to do this.
On a NetApp array each interface is a seen by an ESX host as a separate target, so balance your LUNs behind the targets.
On an EMC Celerra array, you can configure as many iSCSI targets as you want, up to 1000 and assign them to any virtual or physical network interface - balance your LUNs behind the targets.

Select your active paths in the VMware ESX multi-pathing dialog to balance the I/O across the paths to your targets and LUNs using the Virtual Center dialog shown below (from the VMWare iSCSI SAN Configuration Guide). Also it can take up to 60 seconds for the standby path to become active as the session needs to be established and the MPIO failover needs to occur, as noted in VMware iSCSI configuration guide. There are some good tips there (and in the Vendor best practice docs) about extending guest timeouts to withstand the delay without a fatal I/O error in the guest.

Question 2: If I have a single LUN that needs really high bandwidth – more than 160MBps and I can’t wait for the next major ESX version, how do I do that?

Answer 2: Use an iSCSI software initiator in the guest along with either MPIO or MC/S

This model allows the Guest Operating Systems to be “directly” on the SAN and to manage their own LUNs. Assign multiple vNICs to the VM, and map those to different pNICs. Many of the software initiators in this space are very robust (like the Microsoft iSCSI initiator). They provide their guest-based multipathing and load-balancing via MPIO (or MC/S) based on the number of NICs allocated to the VM.

As we worked on this post, all the vendors involved agreed – we’re surprised that this mechanism isn't more popular. People have been doing it for a long time, and it works, even through VMotion operations where some packets are lost (TCP retransmits them – iSCSI is ok with occasional loss, but constant losses slow TCP down – something to look at if you’re seeing poor iSCSI throughput).

It has a big downside, though – you need to manually configure the storage inside each guest, which doesn’t scale particularly well from a configuration standpoint – so for most customers they stick with the “keep it simple” method in Answer 1, and selectively use this for LUNs needing high throughput.

There are other bonuses too:

This also allows host SAN tools to operate seamlessly – on both physical or virtual environments – integration with databases, email systems, backup systems, etc.
Also has the ability to use a different vSwitch and physical network ports than VMkernel allowing for more iSCSI load distribution and separation of VM data traffic from VM boot traffic.
Dynamic and automated LUN (i.e. you don’t need to do something in Virtual Center for the guest to use the storage) surfacing to the VM itself (useful in certain database test/dev use cases)
You can use it for VMs that require a SCSI-3 device (think Windows 2008 cluster quorum disks – though those are not officially supported by VMware even as of VI3.5 update 3)

There are, of course, things that negative about this approach.

I suppose "philosophically" there's something a little dirty of "penetrating the virtualizing abstraction layer", and yeah - I get why that philosophy exists. But hey, we're not really philosophers, right? We're IT professionals, and this works well :-)
It is notable that this option means that SRM is not supported (which depends on LUNs presented to ESX, not to guests)

Question 3: Do I use the Software Initiator or the Hardware Initiator?

Answer 3: In general, use the Software Initiator except where iSCSI boot is specifically required.

This method bypasses the ESX software initiator entirely. Like the ESX software initiator, hardware iSCSI initiators uses the ESX MPIO storage stack for multipathing – but doesn’t have the single connection per target limit.

But, since you still have all the normal caveats with static load balancing and using the ESX NMP software (active/passive model, with static, manual loadbalancing), this won’t increase the throughput for a single iSCSI target.

In general, across all the contributors from each company, our personal preference is to use the software initiator. Why? In general it’s simple, and since it’s used very widely, very tested, very robust. It also has a clear 10GbE support path.

Question 4: Do I use Link Aggregation and if so, how?

Answer 4: There are some reasons to use Link Aggregation, but increasing a throughput to a single iSCSI target isn’t one of them in ESX 3.x.

What about Link Aggregation – shouldn’t that resolve the issue of not being able to drive more than a single GbE for each iSCSI target? In a word – NO. A TCP connection will have the same IP addresses and MAC addresses for the duration of the connection, and therefore the same hash result. This means that regardless of your link aggregation setup, in ESX 3.x, the network traffic from an ESX host for a single iSCSI target will always follow a single link.

So, why discuss it here? While this post focuses on iSCSI, in some cases, customers are using both NFS and iSCSI datastores. In the NFS datastore case, MPIO mechanisms are not an option, load-balancing and HA is all about Link Aggregation. So in that case, the iSCSI solution needs to work in with concurrently existing Link Aggregation.

Now, Link Aggregation can be used completely as an alternative to MPIO from the iSCSI initiator to the target. That said, it is notably more complex than the MPIO mechanism, requiring more configuration, and isn’t better in any material way.

If you’ve configured Link Aggregation to support NFS datastores, it’s easier to leave the existing Link Aggregation from the ESX host to the switch, and then simply layer on top many iSCSI targets and MPIO (i.e. “just do answer 1 on top of the Link Aggregation”).

To keep this post concise and focused on iSCSI, the multi-vendor team here decided to cut out some of NFS/iSCSI hybrid use case and configuration details, and leave that to a subsequent EMC Celerra/NetApp FAS post.

In closing.....

I would suggest that anyone considering iSCSI with VMware should feel confident that their deployments can provide high performance and high availability. You would be joining many, many customer enjoying the benefits of VMware and advanced storage that leverages Ethernet.

To make your deployment a success, understand the “one link max per iSCSI target” ESX 3.x iSCSI initiator behavior. Set your expectations accordingly, and if you have to, use the guest iSCSI initiator method for LUNs needing higher bandwidth than a single link can provide.

Most of all ensure that you follow the best practices of your storage vendor and VMware.

Posted at 09:00 AM in EMC Competitors, EMC VMware Tech Stuff, iSCSI | Permalink

Digg This | Save to del.icio.us

---Regards,
Amarjit Singh

Tuesday, June 23, 2009

Denial of Service attacks : Hacking Tool

SSPing is a DoS tool.
SSPing program sends the victim's computer a series of highly fragmented, oversized ICMP data packets.
The computer receiving the data packets lock when it tries to put the fragments together.
The result is a memory overflow which in turn causes the machine to stop responding.
Affects Win 95/NT and Mac OS

SSPING is a program that can freeze any computer connected to the Internet or on a network running Windows 95, Windows NT, and older versions of the Mac OS that are not behind a firewall that blocks ICMP (Internet Control Message Protocol) data packets. The SSPING program sends the victim's computer a series of highly fragmented, oversized ICMP data packets over the connection. The computer receiving the data packets locks when it tries to put the fragments together. Usually, the attacker only needs to send a few packets, locking the victim's computer instantaneously. When the victim restarts his or her computer, the connection with the attacker is lost and the attacker remains anonymous.

Jolt is a program, which effectively freezes some Windows 95 or Windows NT machines. It is based on old code, which freezes old SysV and Posix implementations. Jolt works by sending a series of spoofed & highly fragmented ICMP packets to the target, which then tries to reassemble the received fragments. As a result, of Jolt Windows 95/NT ceases to function altogether.

This will affect unpatched Windows 95, Memphis and Windows NT machines, which are not behind a firewall that blocks ICMP packets. This will also affect old MacOS machines, and it is possible it is also useful against old SysV/POSIX implementations.

Land Exploit is a DoS attack in which a program sends a TCP SYN packet where the target and source addresses are the same and port numbers are the same.
When an attacker wants to attack a machine using the land exploit, he sends a packet in which the source/destination ports are the same.
Most machines will crash or hang because they do not know how to handle it.

The Land Exploit Denial of Service attack works by sending a spoofed packet with the SYN flag - used in a "handshake" between a client and a host - set from a host to any port that is open and listening. If the packet is programmed to have the same destination and source IP address, when it is sent to a machine, via IP spoofing, the transmission can fool the machine into thinking it is sending itself a message, which, depending on the operating system, will crash the machine.

After receiving spoofed connection request (SYN) packets over TCP/IP, a computer running Windows 95 or Windows NT may begin to operate slowly. After about one minute, Windows returns to normal operation. Variations of this attack can cause any Windows PC to stop responding. (hang)

This behavior occurs due to "Land Attack." Land Attack sends SYN packets with the same source and destination IP addresses and the same source and destination ports to a host computer. This makes it appear as if the host computer sent the packets to itself. Windows 95 and Windows NT operate slowly while the host computer tries to respond to itself.

Smurf is a DoS attack involving forged ICMP packets sent to a broadcast address.
Attackers spoof the source address on ICMP echo requests and sending them to an IP broadcast address. This causes every machine on the broadcast network to receive the reply and respond back to the source address that was forged by the attacker.
1. An attacker starts a forged ICMP packet-source address with broadcast as the destination.
2. All the machines on the segment receives the broadcast and replies to the forged source address.
3. This results in DoS due to high network traffic.

Smurf is a simple yet effective DDoS attack technique that takes advantage of the ICMP (Internet Control Message Protocol). ICMP is normally used on the internet for error handling and for passing control messages. One of its capabilities is to contact a host to see if it is "up" by sending an "echo request" packet. The common "ping" program uses this functionality. Smurf is installed on a computer using a stolen account, and then continuously "pings" one or more networks of computers using a forged source address. This causes all the computers to respond to a different computer than actually sent the packet. The forged source address, which is the actual target of the attack, is then overwhelmed by response traffic. The computer networks that respond to the forged ("spoofed") packet serve as unwitting accomplices to the attack.

The "smurf" attack, named after its exploit program, is one in the category of network-level attacks against hosts. A perpetrator sends a large amount of ICMP echo (ping) traffic at IP broadcast addresses, all of it having a spoofed source address of a victim. If the routing device delivering traffic to those broadcast addresses performs the IP broadcast to layer 2 broadcast function, most hosts on that IP network will take the ICMP echo request and reply to it with an echo reply each, multiplying the traffic by the number of hosts responding. On a multi-access broadcast network, there could potentially be hundreds of machines to reply to each packet.

The "smurf" attack's cousin is called "fraggle", which uses UDP echo packets in the same fashion as the ICMP echo packets; it was a simple re-write of "smurf". There are two parties who are hurt by this attack... the intermediary (broadcast) devices--let's call them "amplifiers", and the spoofed address target, or the "victim". The victim is the target of a large amount of traffic that the amplifiers generate.

Let's look at a scenario to see the nature of this attack. Assume a co-location switched network with 250 hosts, and that the attacker has a T1. The attacker sends, say, a 234b/s stream of ICMP echo (ping) packets, with a spoofed source address of the victim, to the broadcast address of the "bounce site". These ping packets hit the bounce site's broadcast network of 250 hosts; each of them takes the packet and responds to it, creating 250 ping replies out-bound. If you multiply the bandwidth, 58.5 Mbps is used outbound from the "bounce site" after the traffic is multiplied. This is then sent to the victim (the spoofed source of the originating packets). The perpetrators of these attacks rely on the ability to source spoofed packets to the "amplifiers" in order to generate the traffic which causes the denial of service.

In the case of the smurf or fraggle attack, each host which supports this behavior on a broadcast LAN will happily reply with an ICMP or UDP (smurf or fraggle, respectively) echo-reply packet toward the spoofed source address, the victim. The amount of bandwidth and packets per second (pps) that can be generated by this attack is quite large. Many hosts cannot process this many packets per second; many hosts are connected to 10 Mbps Ethernet LANs where more traffic than wire speed is sent. Therefore, the ability to drop these packets at the network border, or even before it flows down the ingress pipes, is desired.

SYN attack floods a targeted system with a series of SYN packets.
Each packet causes the targeted system to issue a SYN-ACK response, while the targeted system waits for the ACK that follows the SYN-ACK, it queues up all outstanding SYN-ACK responses on what is known as a backlog queue.
SYN-ACKs are moved of the queue only when an ACK comes back or when an internal timer (which is set at relatively long intervals) terminates the TCP three-way handshake
Once the queue is full, the system will ignore all incoming SYN requests, making the system unavailable for legitimate users.

Concept

The connectionless TCP attack does not complete the three-way handshake initiated by the originator. Thus, often the packet is crafted with nonexistent (spoofed) source IP. For a connectionless TCP attack, it is more difficult to filter since the source address is not necessarily the original source IP of the packet. When the host fails to find the source IP, it will wait until it times out. The most effective way of stopping such attacks is by applying rate limit. Rate limit is a method of setting threshold to an acceptable number of packets to be processed by the computer.

Concept

One of the most common attacks that will appear on many Intruder Detection System alerts is TCP SYN flood alerts. TCP SYN flood attacks are instigated by crafting packets from spoofed or non-existent source address and generating a high number of half-open connections. Because each connection opened must be processed to its completion (to complete the handshake or eventual timeout), the system is pinned down to perform these tasks. This problem is inherent in any network or operating system running full-fledged TCP/IP design and something that is not easily rectified.

Countermeasure

Network Ingress filtering can also prevent their downstream networks from injecting packets with faked or "spoofed" addressed into the Internet. Although it may not stop the attack, it will make identifying the source host easier and terminate it immediately. RFC 2267 [1] provides more information on Ingress Filtering.

In the TCP/IP protocol, a three-way handshake takes place as a service is connected to. First in a SYN packet from the client, with which the service responses with a SYN-ACK. Finally, the client responds to the SYN-ACK and the conversation is considered started.

A SYN Flood attack is when the client does not response to the SYN-ACK, tying up the service until the service times out, and continues to send SYN packets. The source address of the client is forged to a non-existent host, and as long as the SYN packets are sent faster than the timeout rate of the TCP stack waiting for the time out, the resources of the service will be tied up.

This is a simplified version of what exactly happens. During a SYN flood attack, the attacker sends a large number of SYN packets alone, without the corresponding ACK packet response to the victim's SYN/ACK packets. The victim's connections table rapidly fills with incomplete connections, crowding out the legitimate traffic. Because the rate of attacking SYN packets usually far exceeds that of normal traffic, even when a table entry eventually is cleared out, another attacking SYN packet rather than a legitimate connection will fill it.

But because SYN packets are a necessary part of legitimate traffic, they cannot be filtered out altogether. Second, SYN packets are relatively small, so an attacker can send large numbers of packets using relatively low-bandwidth Internet connections. Finally, because the attacker does not need to receive any data from the victim, the attacker can place random source IP addresses in the attacking packets to camouflage the actual source of the attack, and make filtering all but impossible.

The basic purpose of a SYN flood is to use up all new network connections at a site and thus prevent legal users from being able to connect. TCP connections are made by first sending a request to connect with an ID in it. The receiving connection sends out an acknowledgment saying it's ready and then the sending system is supposed to send an acknowledgment that the connection has been made. The SYN (Synchronize sequence Number) packet is the first of these and contains the ID the receiver is supposed to reply to. If a fake ID is in that packet then the receiving system never gets a connection acknowledgment. Eventually, the connection will time out and that incoming channel on the receiver will become available again for another request. A SYN flood sends so many such requests that all incoming connections be continuously tied up waiting for acknowledgments that never come. This makes the server generally unavailable to legal users (unless one happens to sneak in just at the moment one of the tied-up connections times out).

WinNuke works by sending a packet with "Out of band" data to port 139 of the target host. First off, port 139 is the NetBIOS port and does not accept packets unless the flag OOB is set in incoming packet.
The OOB stands for Out Of Band. When the victim's machine accepts this packet, it causes the computer to crash a blue screen.
Because the program accepting the packets does not know how to appropriately handle Out Of Band data, it crashes.

A "blue bomb" (also known as "WinNuke") is a technique for causing the Windows operating system of someone you are communicating with to crash or suddenly terminate. The "blue bomb" is actually an out-of-band network packet containing information that the operating system cannot process. This condition causes the operating system to "crash" or terminate prematurely. The operating system can usually be restarted without any permanent damage other than possible loss of unsaved data when you crashed.

The blue bomb derives its name from the effect it sometimes causes on the display as the operating system is terminating - a white-on-blue error screen that is commonly known as blue screen of death. Blue bombs are sometimes sent by multi-player game participants who are about to lose or users of Internet Relay Chat (IRC) who are making a final comment. This is known as "nuking" someone. A commonly used program for causing the blue bomb is WinNuke. Many Internet service providers are filtering out the packets so they do not reach users.

The WinNuke attack sends OOB (Out-of-Band) data to an IP address of a Windows machine connected to a network and/or Internet. Usually, the WinNuke program connects via port 139, but other ports are vulnerable if they are open. When a Windows machine receives the out-of-band data, it is unable to handle it and exhibits odd behavior, ranging from a lost Internet connection to a system crash (resulting in the infamous Blue Screen of Death).

WinNuke is practically an outdated attack. All the new Windows versions are immune to WinNuke.

Jolt2 enables users across different networks to send IP fragment-driven denial of service attacks against NT/2000 by making victim's machine utilize 100% of its CPU when it attempts to process the illegal packets.
```
c: \> jolt2 1.2.3.4 -p 80 4.5.6.7 
```
The above command launches the attack from the attacker's machine with a spoofed IP address of 1.2.3.4 against the IP address 4.5.6.7
The victim's machine CPU resources reach 100% causing the machine to lock up.

Sending large numbers of identical fragmented IP packets to a Windows 2000 or NT4 host may cause the target to lock-up for the duration of the attack. The CPU utilization on the target goes to 100% for the duration of the attack. This causes both the UI and network interfaces to lock up.

Jolt2 enables users across different networks to send IP fragment-driven denial of service attacks against NT/2000 by making victim's machine utilize 100% of its CPU when it attempts to process the illegal packets.

Usage:

c: \> jolt2 1.2.3.4 -p 80 4.5.6.7

The above command launches the attack from the attacker's machine with a spoofed IP address of 1.2.3.4 against the IP address 4.5.6.7

The victim's machine CPU resources reach 100% causing the machine to lock up.

Bubonic.c is a DOS exploit that can be run against Windows 2000 machines.
It works by randomly sending TCP packets with random settings with the goal of increasing the load of the machine, so that it eventually crashes.
```
c: \> bubonic 12.23.23.2 10.0.0.1 100 
```

Bubonic.c is a denial of service program written against Windows 2000 machines and certain versions of Linux. It has been noted to work against certain versions of Linux. The denial of service works by randomly sending TCP packets with random settings, etc. This in turn brings the load up causing the box to crash with error code: STOP 0x00000041 (0x00001000, 0x00001279, 0x000042A, 0x00000001) MUST_SUCCEED_POOL_EMPTY

Targa is a program that can be used to run 8 different Denial Of Service attacks.
The attacker has the option to either launch individual attacks or to try all the attacks until it is successful.
Targa is a very powerful program and can do a lot of damage to a company's network.

Targa, written by a German hacker known as Mixter, combines several tools specifically devised to attack machines that run Microsoft Windows. The potency of these tools can be increased further by using them to attack a target machine from several compromised computers at once. However, this requires the attacker to log on to each computer in turn to initiate the attack.

Targa is a free software packet available in the Internet. Targa contains many of the most well known protocol or Operating System based DoS attacks. The attacker must be logged in with root permissions; since most of the attacks, use IP spoofing that requires root privileges. The attack can be done from any machine on which the targa.c code compiles. Mainly, the Targa packet is intended to be used in Linux or BSD Unix computers. Target platforms can be any possible Operating System. However, the attacks do not have an impact on all Operating Systems.

The attacks that can be done with the Targa kit:

Jolt by Jeff W. Roberson (modified by Mixter for overdrop effect) - discussed separately
Land by m3lt - discussed separately
Winnuke by _eci - discussed separately
Nestea by humble and ttol - Nestea exploits the "off by one IP header" bug in the Linux IP packet fragmentation code. Nestea crashes Linux 2.0.33 and earlier and some Windows versions. A new and improved version of the Nestea Linux IP fragmentation is available
Syndrop by PineKoan - Syndrop is a mixture of teardrop and a TCP SYN flooding attack. Affected platforms are Linux and Windows 95/NT.
Teardrop by route|daemon9 - This type of denial of service attack exploits the way that the Internet Protocol (IP) requires a packet that is too large for the next router to handle be divided into fragments. The fragment packet identifies an offset to the beginning of the first packet that enables the entire packet to be reassembled by the receiving system. In the teardrop attack, the attacker's IP puts a confusing offset value in the second or later fragment. If the receiving operating system does not have a plan for this situation, it can cause the system to crash.

This bug has not been shown to cause any significant damage to systems, and a simple reboot is the preferred remedy. However, though non-destructive, this bug could cause possible problems if you have unsaved data in an open application when you are attacked, causing you to lose the data. There are fixes against Teardrop.

Bonk by route |daemon9 & klepto - Bonk is based on teardrop.c. Bonk crashes Windows 95 and NT operating systems. Boink is an improved version of bonk.c. Boink allows UDP port ranges and can possibly crash a patched Windows 95/NT machine. NewTear is another variant of teardrop.c, which is slightly different from bonk.c. Mainly they do the same thing just in different ways. Small changes in the code may have significant changes in the results, as you can see below.
NewTear by route | daemon9 - NewTear is another variant of teardrop.c

---Regards,
Amarjit Singh

Ping of Death

Ping of Death

An attacker sends a large ping packet to the victim's machine. Most OS do not know what to do with a packet that is larger than the maximum size, it causes the OS to hang or crash.
Example: Ping of Death causes blue screen of death in Windows NT.
Ping of Death uses ICMP to cause a denial of service attack against a given system.

Ping of death is a denial of service (DoS) attack caused by an attacker purposely sending an IP packet larger than the 65,536 bytes allowed by the IP protocol. One of the features of TCP/IP is fragmentation. It allows a single IP packet to be broken down into smaller segments. In 1996, attackers took advantage of that feature when they found that a packet broken down into fragments could add up to more than the allowed 65,536 bytes.

When a large ICMP packet is sent by a hostile machine to a target, the target receives the ping in fragments and starts reassembling the packet. However, due to the size of the packet once it is reassembled it is too big for the buffer and overflows it. Many operating systems did not know what to do when they received an oversized packet, so they froze, crashed, or rebooted. Ping of death attacks are particularly malicious because the identity of the attacker sending the oversized packet can be easily spoofed and also because the attacker just needs an IP address to carry out his attack.

Windows 95 and Windows NT are capable of sending such a packet. By simply typing in "ping target -165500" you can send such a ping. There are also source code examples available for Unix platforms that allow large ping packets to be constructed.

By the end of 1997, operating system vendors had made patches available to avoid the ping of death. However, many Web sites continue to block Internet Control Message Protocol (ICMP) ping messages at their firewalls to prevent any future variations of this kind of denial of service attack. Ping of death is also known as "long ICMP". Variations of the attack include jolt, sPING, ICMP bug, and IceNewk.

---Regards,
Amarjit Singh

What is Distributed Denial of Service Attacks

An attacker launches the attack using several machines. In this case, an attacker breaks into several machines, or coordinates with several zombies to launch an attack against a target or network at the same time.
This makes it difficult to detect because attacks originate from several IP addresses.
If a single IP address is attacking a company, it can block that address at its firewall. If it

DDoS attacks involve breaking into hundreds or thousands of machines all over the Internet. Then the attacker installs DDoS software on them, allowing them to control all these burgled machines to launch coordinated attacks on victim sites. These attacks typically exhaust bandwidth, router processing capacity, or network stack resources, breaking network connectivity to the victims.

DDoS is a combination of DoS attacks staged or carried out in concert from various hosts to penalize the target host from further serving its function. DDoS is term coined when the source of the attack is not coming from a single source, but multiple sources. DDoS cannot be eliminated with merely filtering the source IPs since it is often launched from multiple points installed with agents. Some known DDoS tools are Mstream, Trinoo, TFN2K (Tribe Flood Network), Stacheldraht and Shaft. DDoS attack is an example of a bandwidth attack.

Concept

The WWW Security FAQ defines Distributed Denial of Service (DDoS) attacks as:

A Distributed Denial of Service (DDoS) attack uses many computers to launch a coordinated DoS attack against one or more targets. Using client/server technology, the perpetrator is able to multiply the effectiveness of the Denial of Service significantly by harnessing the resources of multiple unwitting accomplice computers, which serve as attack platforms. Typically, a DDoS master program is installed on one computer using a stolen account. The master program, at a designated time, then communicates to any number of "agent" programs, installed on computers anywhere on the Internet. The agents, when they receive the command, initiate the attack. Using client/server technology, the master program can initiate hundreds or even thousands of agent programs within seconds.

---Regards,
Amarjit Singh

Types of Denial-of-Service Attacks

There are several general categories of DoS attacks. Some groups divide attacks into three classes: bandwidth attacks, protocol attacks, and logic attacks.

Bandwidth/Throughput Attacks

Bandwidth attacks are relatively straightforward attempts to consume resources, such as network bandwidth or equipment throughput. High-data-volume attacks can consume all available bandwidth between an ISP and your site. The link fills up, and legitimate traffic slows down. Timeouts may occur, causing retransmission, generating even more traffic.

An attacker can consume bandwidth by transmitting any traffic at all on your network connection. A basic flood attack might use UDP or ICMP packets to simply consume all available bandwidth. For that matter, an attack could consist of TCP or raw IP packets, as long as the traffic is routed to your network.

A simple bandwidth-consumption attack can exploit the throughput limits of servers or network equipment by focusing on high packet rates—sending large numbers of small packets. High-packet-rate attacks typically overwhelm network equipment before the traffic reaches the limit of available bandwidth. Routers, servers, and firewalls all have constraints on input-output processing, interrupt processing, CPU, and memory resources. Network equipment that reads packet headers to properly route traffic becomes stressed handling the high packet rate (packets per second), not the volume of the data (Mbps). In practice, denial of service is often accomplished by high packet rates, not by just traffic volume.

Protocol Attacks

The basic flood attack can be further refined to take advantage of the inherent design of common network protocols. These attacks do not directly exploit weaknesses in TCP/IP stacks or network applications but, instead, use the expected behavior of protocols such as TCP, UDP, and ICMP to the attacker's advantage. Examples of protocol attacks include the following:

SYN flood is an asymmetric resource starvation attack in which the attacker floods the victim with TCP SYN packets and the victim allocates resources to accept perceived incoming connections. As mentioned above, the proposed Host Identity Payload and Protocol (HIP) are designed to mitigate the effects of a SYN flood attack. Another technique, SYN Cookies (see http://cr.yp.to/syncookies.html), is implemented in some TCP/IP stacks.
Smurf is an asymmetric reflector attack that targets a vulnerable network broadcast address with ICMP ECHO REQUEST packets and spoofs the source of the victim (see http://www.cert.org/advisories/CA-1998-01.html).
fraggle is a variant of smurf that sends UDP packets to echo or chargen ports on broadcast addresses and spoofs the source of the victim.

Software Vulnerability Attacks

Unlike flooding and protocol attacks, which seek to consume network or state resources, logic attacks exploit vulnerabilities in network software, such as a web server, or the underlying TCP/IP stack. Some vulnerabilities by crafting even a single malformed packet.

teardrop (bonk, boink) exploits TCP/IP IP stacks that do not properly handle overlapping IP fragments (see http://www.cert.org/advisories/CA-1997-28.html).
land crafts IP packets with the source address and port set to be the same as the destination address and port (see http://www.cert.org/advisories/CA-1997-28.html).
ping of death sends a single large ICMP ECHO REQUEST packet to the target.
Naptha is a resource-starvation attack that exploits vulnerable TCP/IP stacks using crafted TCP packets. (See http://www.cert.org/advisories/CA-2000-21.html).

There are many variations on these common types of attacks and many varieties of attack tools to implement them.

Denial-of-service attacks may be effective because of a combination of effects. For example, an attack that does not fully consume bandwidth or overload equipment throughput may be effective because it generates enough malformed traffic to crash a particular service, such as a web server or mail server.

---Regards,

Amarjit Singh

Share This Post With Your Friends

Wednesday, June 24, 2009

A “Multivendor Post” to help our mutual iSCSI customers using VMware

Tuesday, June 23, 2009

Denial of Service attacks : Hacking Tool

Ping of Death

What is Distributed Denial of Service Attacks

Types of Denial-of-Service Attacks

Types of Denial-of-Service Attacks