Tools used to troubleshoot
1. nstrace
2. nstcpdump.sh
3. /var/log/ns.log

#nstcpdump.sh -c 10 -nn host X.X.X.X and port 443
start nstrace -filter "CONNECTION.DSTIP.EQ(10.1.1.1)" -link ENABLED -size 0

root@sdx1vm-myafw-prd011# cd /var/log
root@sdx1vm-myafw-prd011# tail -f ns.log | grep '<blocked>'

But! Lets first start talking about troubleshooting the network. If you have some traffic issues that you want to debug closer we can start a nstrace.sh or a nstcpdump.sh (for more low level debugging)both of these store in cap (capture files) that you can analyze further via for instance Network Analyzer or Wireshark, and ill show you how.

If I run for instance the command nstrace –time 30

I will store trace data for 30 seconds in each a new cap file.
I could also for instance apply a filter to the trace in order to “filter out” stuff that I don’t need.

I could use –filter “SOURCEIP == 10.0.0.1” –time 30
This would fetch out traffic where the source IP is from 10.0.0.1

There are some other filters that you can use, they are listed here –>
http://support.citrix.com/article/CTX120941

In order to kill a trace push CTRL + C

Now if you want to fetch out these files you need a SCP client for instance WINSCP
After you have downloaded and installed the client you can establish an connection to the NSIP.

Now if you go to the root/var/nstrace catalog and copy over some of the cap files. And I suggest that you open these in for instance Wireshark.
IF you open Wireshark and choose the import file option and choose one of the cap files

This issue is caused by bad syntax

nstcpdump

root@ns# nstcpdump.sh BADsyntax
reading from file -, link-type EN10MB (Ethernet)
tcpdump: syntax error

root@ns# nstcpdump.sh
tcpdump: bad dump file format
root@ns#

To fix this, kill all nstraceaggregator

root@ns# ps -ef | grep nstraceaggregator
1469 0 R 2:54.69 /netscaler/nstraceaggregator /var/nstrace/nstcpdump 3600 2 1 1 0
root@ns# kill 1469
root@ns# ps -ef | grep nstraceaggregator
root@ns#

And execute the following commands to clean up:
nstrace.sh -stop

Examples

The following are some of the examples for running the nstcpdump.sh script:

root@ns# nstcpdump.sh -X dst host 10.102.13.14 and port 80
The output of this command is displayed on stdout and consists of all tcp port 80 traffic destined to the 10.102.13.14 IP address.

root@ns# nstcpdump.sh -w /var/trace/trace1.cap -i 1/1 -i ½
The output of this command is directed to the /var/trace/trace1.cap file and consists of all traffic on the interfaces 1/1 and 1/2.

root@ns# nstcpdump.sh -w /var/trace/trace2.cap host 10.102.13.14 and not port 443
The output of this command is directed to the /var/trace/trace2.cap file and consists of all traffic to or from the host IP address 10.102.13.14 and which does not have destination or source port as 443.

root@ns# nstcpdump.sh host 10.102.13.14 and host 10.102.13.15
The output of this command is displayed on stdout and consists of all traffic between the host 10.102.13.14 and 10.102.1315 IP addresses.

Sample Output

The following is a sample output of the nstcpdump.sh script:

root@103# nstcpdump.sh port 80
Setting 1000 pages (4000 KB) of trace buffers ... Done.
Enabling all nic trace mode=6 ... Done.
Changing trace packet length from 0 to 0 ... Done.
Saving current trace data in file 'pipe' for '3600' seconds ... in TCPDUMP format
18:17:13.391479 10.198.4.112.29221 > 10.198.4.41.http: S 1430428239:1430428239(0) win 8190 <mss 1460>
18:17:13.391599 10.198.4.41.http > 10.198.4.112.29221: R 0:0(0) ack 1430428240 win 0 (DF)
18:17:13.691462 10.198.4.112.29217 > 10.198.4.204.http: R 1430282160:1430282160(0) win 9800
18:17:13.691467 10.198.4.112.29217 > 10.198.4.204.http: R 1430282160:1430282160(0) win 9800
18:17:16.091528 127.0.0.2.61049 > localhost.http: S 1430522929:1430522929(0) win 8190 <mss 1460>
18:17:16.091566 localhost.http > 127.0.0.2.61049: S 1213225328:1213225328(0) ack 1430522930 win 57344 <mss 1460> (DF)
18:17:16.091570 127.0.0.2.61049 > localhost.http: F 1:1(0) ack 1 win 8190
18:17:16.091585 localhost.http > 127.0.0.2.61049: . ack 2 win 58400 (DF)
18:17:16.091654 localhost.http > 127.0.0.2.61049: F 1:1(0) ack 2 win 58400 (DF)
18:17:16.091665 127.0.0.2.61049 > localhost.http: . ack 2 win 8190/div>
The following table contains the details of the entry highlighted in the preceding output:

ntrace

How to Capture an nstrace from the Command Line Interface of NetScaler Appliance
Created: 26 Mar 2014 | Modified: 29 Dec 2015

Objective
This article explains how to capture a NetScaler network trace using the command line interface of NetScaler appliance.

Background
With NetScaler you can set filters to capture a specific client transaction or back-end server communication. This provides a greater flexibility to capture a trace from front-end (client to virtual server) or back-end (NetScaler appliance to server) or both.

Instructions
To capture a NetScaler network trace, complete the following steps:

Log on to the NetScaler appliance through PuTTY, or Secure Console.

Run the start nstrace command to capture the network trace on the NetScaler appliance in native format with the extension .cap.

To stop the trace after capturing the required information, press Ctrl+C.

Download the trace file through the GUI or through SFTP or WinSCP. The trace files are stored in the /var/nstrace directory.

The trace files captured can be viewed with the Wireshark application.

Operations Supported by nstrace Command

For the complete list of arguments supported by nstrace Command, refer to Citrix Documentation - nstrace

Filter Qualifiers
Qualifier Value
SRCIP IP Address
SRCPORT Port Number
DESTIP IP Address
DESTPORT Port Number
SVCNAME Service Name
CS_VSERVER/LB_VSERVER Vserver Name
TCPSTATE CLOSE_WAIT, CLOSED, CLOSING, ESTABLISHED, FIN_WAIT_1, FIN_WAIT_2, LAST_ACK, LISTEN, SYN_RECEIVED,
SYN_SENT, TIME_WAIT
Filter Operators
==, eq, !=, neq, >, gt, <, lt, >=, ge, <=, le, BETWEEN
Compound Expressions can be configured with filters &&, ||.

nstrace Examples

start nstrace -filter "CONNECTION.DSTIP.EQ(10.1.1.1)" -link ENABLED -size 0

This command captures the trace with the IP address (in this example, the IP address of the VIP) and also the back-end connection because the link option is enabled. The size is 0.

start nstrace -filter "CONNECTION.SRCIP.EQ(10.1.1.1)" -time 10 -fileName sample_trace

This command captures the trace with the source IP address of the client with the duration of each trace as 10 seconds and the name of the file is sample_trace.

start nstrace -filter "CONNECTION.DSTIP.EQ(10.1.1.1) || CONNECTION.SRCIP.EQ(10.1.1.2)" -size 0

This command captures the trace with the source IP address of the client (10.0.0.1) or with a specific server IP address (10.1.1.2).

start nstrace -filter "CONNECTION.DSTIP.EQ(10.1.1.1) || CONNECTION.SRCIP.EQ(10.1.1.2)" -size 0 -nf 24 -time 30

This command captures the trace with the source IP address of the client (10.0.0.1) or with a specific server IP address (10.1.1.2). This command can be used in case of intermittent connection failure issues. This will enable cyclic capture of trace on the NetScaler and captures a new trace every 30 seconds. This process will continue for the next 24 (nf) files and after that it will start rewriting the old files. The advantage of this command is that it will only consume space for 24 trace files at any given time and will keep the memory from running out.

root@sdx1vm-myafw-prd011# cd /var/log

root@sdx1vm-LANAFW-PRD01# tail -f ns.log | grep '<blocked>'
Mar 7 09:42:33 <local0.info> 10.210.7.102 03/07/2014:09:42:33 sdx1vm-LANAFW-PRD01 PPE-0 : APPFW APPFW_BUFFEROVERFLOW_URL 217049942 : 10.210.52.45 1394411771-PPE0 - BlueWeb URL length(3157) is greater than maximum allowed(3072) : http://blueweb.bluecrossma.com/root/search/news-comm-srch-index.jsp?mainURL=/root/news-communications/newsline/nlxmlarticle.jsp?year=2013|folder=July|doc=blueapples-fep.xml|iphl=federales+employing+programming:federales+employees+programming:federales+employed+programming:federales+employees+programmed:federales+employing+programmed:federales+employees+programing:federales+employe <blocked>

root@sdx1vm-LANAFW-PRD01# pwd
/var/log

root@sdx1vm-LANAFW-PRD01#

1. Citrix NetScaler – Basic Trouble-Shooting

1.1 Discussion

The low level monitoring of NetScaler telemetry is performed for the most part in the CLI environment. However, some additional monitoring is also done through a set of commands invoked in the UNIX Shell environment. The following conventions will be used in this section to depict which CLI is being referred to:

· The > symbol represents a command invoked from the NetScaler CLI environment (connected to from a direct or remote console connection; or an SSH connection).

· The # symbol represents a UNIX shell command, which is invoked by typing the shell command from in the NetScaler CLI.

1.2 Sample Cisco CSM vs. NetScaler CLI Syntax Conversion.


	Cisco Syntax	NetScaler Syntax
1	show running-config	> show ns.conf or show ns.conf \| more
2	copy runn start	> save config or save c
3	show ip interface show ip interface brief	> show nsip
4	show ip interface vlan 103	> show nsip XXX.XXX.XXX.XXX (from the list above)
5	show vlan brief show vlan 103	> show vlan > show vlan 103
6	show runn \| sec <string> show runn \| begin <string>	> show ns.conf\| grep <string>
7	sh mod csm 3 reals	> show service or show service <name>
8	sh mod csm 3 vservers	> show vserver or show vserver <name>
9	sh mod csm 3 conns	# nsapimgr –d allpcbs Note: presently in the NS-OS 7.0.x this command does not show connection on per Source IP bases. This feature is available in Version 8.0.x, which is currently in the stage of controlled customer relase.
10	sh mod csm 3 probe det	> show service <name>
11	sh mod csm 3 ft	> show node
12	sh mod csm 3 arp	> show arp
13	sh mod csm 3 serverfarms det	> show lb vserver <name>
14

1.2.1 Failure Condition: Not able to connect to the NetScaler CLI nor Browser GUI:

Trouble-shooting steps:

· Check to make sure the SSH tcp/22 ; HTTP tcp/80; tcp/3010; tcp/3011 (if GSLB is configured) sockets are allowed through the a Firewall and/or a Router ACL (if applicable).

· From the NetScaler UNIX shell execute the: nstcpdump port 22; nstcpdump port 3010; nstcpdump.sh port 3011 (latter if applicable) as separate commands to see if the respective protocol packets are seen ingressing the NetScaler device. If the respective packets are not seen then the problem is external to the device.

1.2.2 Failure Condition: Cannot browse to a web page destined to the NetScaler VIP (using either the discrete IP address or the FQDN.

· First and foremost validate the IP address configured for the VServer (and all other NetScaler IP interfaces for that matter) are not duplicate addresses with another device in the given subnet.

· Trouble-shoot Layer 2/3 issues as follows:

o Although the next hop gateway/router IP may be pingable, execute the show arp command on the NetScaler device and the upstream gateway device to make sure the ARP entries are properly learned through the corresponding VLANs. If needed supplement this with show bridge or show bridge | grep [MAC-Address] command.

o Validate ARP entries in the upstream or adjacent gateway device(s) to make sure the NetScaler MAC address for a give IP address matches that of the show interface [1/X] output from the NetScaler.

Note: If the local next-hop router interface (for say a default or a static route) is not pingable, but the proper MAC shows in the ARP table, then there is a VLAN mismatch

o Validate proper link state (or line protocol) negotiation for the physical interfaces in question by executing show interface 1/X. Validate trunking/tagging and VLAN configuration on the NetScaler device by executing show vlan and show channel LA/1 and show interface LA/1. Repeat the same in the upstream switching device.

Note: For proper trunking/tagging integration with NetScaler the trunk protocol must be configured as 802.1Q encapsulation (or dot1q).

o Ping the far-end interface of the same next-hop router. If not pingable are there any Firewall/ACL issues? Is the route-back information to the NetScaler subnet(s) properly known/routed?

o Is there a roper route-back to the client subnet from where the browser connection is made?

o Execute the show node command to verify proper HA convergence between two NetScaler units.

· Trouble-shoot Layer 4-7 issues as follows:

o If browsing to the NetScaler VIP using the FQDN as the destination, verify that the NetScaler VServer IP address properly registered and resolved in DNS, and that all DNS records (such “A”, “PTR”, CNAME, etc are accounted for)

o Check the stats of show service <service-name>. Verify that the configured monitors/probes are passing. If the STATE shows DOWN switch to a simpler monitor such as TCP or ICMP and note any difference.

o If the service state shows DOWN, perform L2/3 trouble-shooting steps above to the Server(s) IP addresses.

o Validate that the server in question is properly running the NetScaler hosted service by executing a low-level “telnet to the port” of each server from the UNIX shell. Example:

o Execute show lb vserver <vserver-name> and check for proper stats. If the services bound to an SSL offloading VServer are UP, but state of the VServer is DOWN check for proper SSL certificate binding. There may be a DOWN[Certkey not bound] message displayed. Supplement this step by examining the state of the installed SSL certificates:

o For the VIP in question hosting two or more servers, take one of the server out of rotation by executing disable service <service-name>, so that the problem can be narrowed down to a specific server.

o Browse directly to the server IP address if possible. If this works trouble-shoot the default gateway issues and validate how a given server returns connections back to the originating client

1.2.3 Failure Condition: Browsing to a NetScaler VServer address is sluggish and performing slowly.

- Execute ping to the server(s) form remote corporate LAN, if possible, with packet size greater than MTU (ping <server-IP> -t -l 2048 [or 4096]), and verify that round trip is less than 80msec. If greater than the cause is probable networking issues.

- If ping from corporate remote LAN is not possible, run ping from the router/gateway local to server(s). Check that round trip is less 2-3msec.

- Additionally, Trouble-shoot the problem by taking a server out of rotation (as described above) to narrow down the problem.

- Execute “telnet to the port” of each server to get the look-and-feel of the response.

- Check for monitor misses in show service <service-name> output.

- Check the output of stat lb vserver <vserver-name> and check the surgeQ; svrtTTFP (time-to-first-byte); Current Server Connections. Supliment this with stat service <service-name> and check the same stats.

- Address the servers performance by tuning as necessary for the above three stats.

The execution and meaning of the CLI commands referred to in this section is addressed in the sections below.

1.3 Layer 2 Verification Commands


	Screen Shot or Command Cline	Description
1	> show interfaces > show interface <number>	Shows various statistics (such running status, link status, medial, and frames in/out) for all or given physical interface
2	> show arp > show arp \| grep 10.10.10.75	Displays the ARP cache table with dynamically learned MAC entries via corresponding physical ports and VLANs
3	> show bridge > show bridge \| grep 00:03:ba:16:de:65	Displays the bridge ports statistics and the corresponding physical ports and VLANs
4	> show vlan	Displays all configured VLANs and their physical interface members

1.4 Layer 3 Trouble-Shooting


	Screen Shot or Command Cline	Description
1		Shows all configured IP interfaces/addresses. When a specific IP Address is given as an argument, the display provides more detailed properties. For the virtual IP Interfaces (such as VIP, SNIP, MIP), the Active note state will be indicated on the Primary unit, while on the Secondary NetScaler it will be in the Passive (effectively disabled) state.
2	> show route	Displays the global routing table. In the Type column: - The STATIC refers to an explicit Static Route entry add with “add route …” command - The PERMANENT refers to a bult-in system added entry. - The DIRECT refers to directly connected local network or a subnet
3	> ping 10.10.11.10 > ping -S 10.10.11.10 10.10.10.75	The -S (upper case) option sources the ping from a specific IP Address
4	> traceroute 10.68.86.191 > traceroute -s 148.173.239.86 10.68.86.191	The -s (lower case) option sources the trace from a specific source IP address
5	> show node	Displays HA status of each device in a redundant pair, including the Primary/Secondary status, as well as the configuration synchronization state

1.5 Layer 4 - 7 Trouble-Shooting

> show server

The output of “show server” displays all locally configured Real Servers: their names and their corresponding IP addresses. Although it is possible to administratively disable a server, if an operator desires to take a server out of rotation it is typically done at the Service level (not at the server level)

> show service
> show service <service-name>
> enable service <service-name>
> disable service <service-name>

The output of “show service” shows all the configured services on the NetScaler. The Service represents the type of an IP application (HTTP, SSL, TCP, etc.) tajt will be hosted for a particular Real Server. The IP address, port, protocol and state change are also reflected here. Additional information including TCP Buffering settings, compression, client-ip, USIP (use Source IP Address), surge control, surge protection settings can also be viewed here. The state of a service can be one of the following:

State	Meaning
UP	The server is passing the NetScaler monitor probes
DOWN	The server is failing NetScaler monitor probes
OUT OF SERVICE	Administrator has explicitly disabled the server over-riding the monitor probes.

For example, in the display below an HTTP protocol service and Port 80 is configured for a web server “sppww032”. It is also configured to listen on port 80. In the display below it is currently marked as UP meaning in service, as the basic HTTP monitor probes checking the sppww032 server (by sending a basic “HEAD /” and expecting an HTTP response code “200 OK”) are passing successfully.

Some additional information to note in this display is:

Option	Meaning
Max Conn: 1000	This is a configured value of maximum allowable client side connections that this server can accept. The default is 0 meaning infinity.
Use Source IP: NO	The default network proxy function and LB mechanism of TCP offload is enabled.
Client Keepalive(CKA): YES	This is a default client side optimization feature based on the Apache “connection-keepalive” header directive adopted for NetScaler use.
Idle timeout: Client: 180 sec Server: 360 sec	Default setting for Client and Server side connection inactivity timer.
Down state flush: ENABLED	This is a default setting, meaning if a server fails the monitor, the NS will mark the service DOWN, as well as will TCP-reset all pending connections.

Note: Taking a server out of rotation from the NetScaler perspective is done here at the service level by running the disable service command above, and thus casing the service state to read State: OUT OF SERVICE

> stat service -full
> stat service <service-name> -full

The stat service output provides a list of all configured services including the built-in non-application facing services. The -full option display complete names rather than abbreviations. When used with the service name as an argument the output includes a greater performance detail.

In the above display the explanation of values is as follows:

Field Name	Explanation
Requests	The rate at which requests are sent to the NetScaler and the total number of requests
Responses	The rate at which responses are sent from the NetScaler and the total number of responses
Request Bytes	The number of bytes (payload) for requests
Response Bytes	The number of bytes (payload) for responses
Current Client Connections	This is not relevant because the number of client connections is measured at the VIP (Vserver Level). Logically, all connections are sourced from the Client IP to the Vserver. Hence the client connections should always be measured there.
Current Server Connections	These are the number of connections that are opened from the NetScaler to the backend server.
Maximum Server Connections	This is relevant if there is a hard-stop configured for max-server connections. Any new connection above this value is sent to the surge Queue.
Requests in Surge Queue	Any connections that are queued in the NetScaler due to a max-client or max-conn configuration on the NetScaler would be sent to the Surge Queue. As the connection gets freed up on the server, the connection is de queued and sent to the backend server.
Connections in reuse Pool	If HTTP protocol is used, the NetScaler can reuse existing connections for efficient server connection management. This field shows you the number of connections in the reuse pool. The goal of re-use pool is to limit server side connections to free up resources for additional connections.
Average Server TTFB	TTFB refers to Time To First Byte. This provides the average response time for each server configured on the NetScaler. NetScaler measures the response times using various methods including ICMP echo response, TCP SYN ACK etc.

> show vserver
> show vserver <vserver-name>

This command provides a list of all the configured virtual servers on the NetScaler. It provides a general overview of the name of the Vserver, IP Address, Protocol, Port, Load Balancing Method and persistence configured.

Virtual Server State	Meaning
UP	If the state of at least one of the bound services is UP then the state of the virtual server is UP.
DOWN	If state of all the bound services are DOWN then the virtual server fails and is marked DOWN.
DOWN[Certkey not bound]	Failure of an HTTPS offloading VServer due to lack of a proper SSL certificate binding
OUT OF SERVICE	The state of the VServer changes to “OUT OF SERVICE” upon its manual administrative disablement.
OUT OF SERVICE(IP DISABLED)	Optionally, If a VServer is admin-disabled at the virtual IP Interface level (which is automatically created upon configuring a given VServer). This is done by executing disable nsip X.X.X.X

The show vserver <name> provides a more detaild information about the VServer including the services that are “bound” to the VServer and their operational state.

> stat lb vserver
> stat lb vserver <vserver-name>

> stat lb vserver <vserver-name> -full

This command provides general statistics on all configured VServers. Specific details can be obtained by adding a VServer name as an argument. The details include total client and server connections, connection rate, egress of MAX-Connections that are in the surge-queue, as well as hits on the configured services. The stat lb vserver <name> -fullValues displays even more details, which further expands the bound services detail.

1.6 Additional Trouble-Shooting Tools

# nsapimgr -d allpcbs

Displaying the system connection flows (or the session binding) is limited in the presently deployed NS-OS 7.0, in that it does not provide specific Client-Server connection detail. Generally speaking it provides a list of all socket connections including non-application flows. However, this functionality is fully supported in the next release 8.0 (presently in controlled deployment).

Another very useful feature is that the NetScaler product provides a built in Packet Capture tools.

· The NSTRACE.SH is the executable binary that provides a full packet decode analyses in non-real time. This tool is executed from the UNIX shell environment. A given session will run for one hour (3600 seconds) unless interrupted by pressing CTRL-C keyboard sequence, which saves the capture as TCPDUMP format file: /var/nstrace/nstrace[n].pcap. The resulting file is then RCP’d from the system to a local computer and can be examined by a protocol analyzer software such WireShark (formerly known as Ethereal).

# nstrace.sh -sz 0 -tcpdump 1

· The NSTCPDUMP.SH is a real-time full featured Linux compatible packet trace utility ported for the NetScaler use.

# nstcpdump.sh [options]

DK - Engineer's Notebook Netscaler

Friday, February 12, 2016

troubleshooting