One Man IT Shop: July 2009

Friday, July 31, 2009

Ftype

I found this command useful and think it deserve some PR.
I know, you can have a full rich IT life without using it or even knowing about it BUT when the day come it can save you hours of work.
Ftype is a command line that displays or modifies file types used in file name extension associations.
The basic command displays the file types that have open command strings defined.

c:\ftype

The output is long and descriptive.
Ftype is very useful in scripts, installations that involve unique file extensions but also when you work on a PC and have problems. When you start going out of the mainstream Microsoft based applications many programs use different file types that are different from the default MS installations. Locating the missing file extensions is much easier using this command hence it is good to be familiar with it.

Here you can find a greate example for a good use of this command

System Administrator Appreciation Day

Happy SysAdminDay to all of you out there
get some good fun here

Thursday, July 30, 2009

Guide To Computer Hardware

while working on my hardware I find myself Googling for hardware types of slots, ports and adapters
I ran into this link which I founs very useful and I'm sure you can use it too
there is also a full size link

Wednesday, July 29, 2009

The RAID language

Did you ever wonder about all the options and sub-options you have to choose from when setting up a RAID array?
So I had to do build a new server this morning and wanted to really understand all these terms, once and for all.
I’m working with HP servers but the RAID language is (mostly) universal:

let's start with the basic, what is RAID
Redundant Array of Inexpensive Disks. Listed are the most commonly used configurations:

RAID 0 - No Fault Tolerance
RAID 1+0 - Drive Mirroring
RAID 4 - Data Guarding
RAID 5 - Distributed Data Guarding
RAID 6 (ADG) - Advanced Data Guarding
RAID 0 - No Fault Tolerance; indicates that there is no fault tolerance method used. However, the data is striped across all physical drives in the array for rapid access.
If you select this option for any of your logical drives, you will experience data loss for that logical drive if one physical drive fails. However, because none of the capacity of the logical drives is used for redundant data, this method offers the best processing speed and capacity. You may consider assigning RAID 0 to drives that require large capacity and high speed, but pose no safety risk.
RAID 1+0 - Drive Mirroring is a fault tolerance method that uses 50 percent of drive storage capacity to provide greater data reliability by storing a duplicate of all user data. Half the physical drives in the array are duplicated or "mirrored" by the other half.
RAID 1+0 first mirrors each drive in the array to another, then stripes the data across the mirrored pairs.
Drive mirroring creates fault tolerance by storing two sets of duplicate data on a pair of disk drives. There must be an even number of drives for RAID 1+0. This is the most costly fault tolerance method.
If a drive fails, the mirror drive provides a backup copy of the files and normal system operations are not interrupted. The mirroring feature requires a minimum of two drives, and in a multiple drive configuration (four or more drives), mirroring can withstand multiple simultaneous drive failures as long as the failed drives are not mirrored to each other.
RAID 4 - Data Guarding is a fault tolerance method that uses a small percentage of a drive array storage capacity to store data guard code that is used to recover data if a physical drive fails.
RAID 5 - Distributed Data Guarding is a fault tolerance method that stores parity data across all the physical drives in the array, which allows more simultaneous read operations and higher performance than RAID 4 - Data Guarding. If a drive fails, the controller uses the parity data and the data on the remaining drives to reconstruct data from the failed drive. This allows the system to continue operating with a slightly reduced performance until you replace the failed drive.
RAID 5 requires an array with a minimum of 3 physical drives. The capacity of the logical drive used for fault tolerance depends on the number of physical drives in the array. For example, in an array containing 3 physical drives, only 33 percent of the total logical drive storage capacity is used for parity data; while a 14-drive configuration uses only 7 percent.
RAID 6 (ADG) - Advanced Data Guarding. This fault tolerance method provides the highest level of data protection. It is similar to RAID 5 in that parity data is distributed across all drives in the array, except that multiple separate sets of parity data are used in RAID 6 (ADG), and the capacity of multiple drives is used to store the parity data. Assuming the capacity of two drives is used for parity data, the system will continue to operate even if two drives fail simultaneously, whereas RAID 4 and RAID 5 can only sustain failure of a single drive. The fault tolerance of RAID 6 (ADG) configurations is actually higher than that of RAID 1+0 configurations because in RAID 1+0 there is a chance that two drives mirrored to each other will fail simultaneously.
RAID 6 (ADG) read performance is similar to that of RAID 5, since all drives can service read operations. However, the write performance is lower with RAID 6 (ADG) than with RAID 5, because parity data must be updated on multiple drives. Performance is further reduced in a degraded state.
RAID 6 (ADG) requires an array with a minimum of 2+P physical drives, where P is the number of drives used for parity data; normally, P= 2. The percentage of total drive capacity used for fault tolerance is equal to the number of drives used for parity data divided by the total number of physical drives. For example, in an array containing a total of five physical drives (two of which are used for parity), 40 percent of the total logical drive storage capacity is used for fault tolerance. A 14-drive configuration (again using two drives for parity) uses only 14 percent of total capacity for fault tolerance.

Fault Tolerance
The ability of a server to recover from hardware problems without interrupting the server's performance. Fault tolerance methods, among others, include:

RAID 1+0 - Drive Mirroring
RAID 4 - Data Guarding
RAID 5 - Distributed Data Guarding
RAID 6 (ADG) - Advanced Data Guarding

Logical Drive Extension
This allows you to increase the size of an existing logical drive without disturbing the data on the logical drive. If an existing logical drive is full of data, you can extend the logical drive when there is free space on the array. If there is no free space on the array, you can add drives to the array and proceed to extend the logical drive.

Logical Drives
An equal area from all physical drives in a drive array grouped together logically to act as a single hard drive. Logical drives are configured with software utilities to enhance the performance and usability of drive arrays.

Online Spare/Active SpareOnline Spare - physical drive used in RAID 1+0 - Drive Mirroring, RAID 4 - Data Guarding, RAID 5 - Distributed Data Guarding, and RAID 6 (ADG) - Advanced Data Guarding to provide drive replacement for a failed drive without user intervention. The spare immediately replaces a failed drive as soon as the failure occurs. The controller automatically begins rebuilding the data from the failed drive on the spare to return to a fault tolerant state. The failed drive can be replaced while the system is operating at top performance. The drawback is that the drive is not used while inactive and this reduces the amount of usable storage capacity.
A spare may become an Active Spare if the current Active Spare fails and there are multiple spares available. An Active Spare is a spare that is currently in use by an Array which contains a failed Physical Drive.

Arrays
A group of physical drives configured into one or more logical drives. Arrayed drives have significant performance and data protection advantages over non-arrayed drives.

Array Accelerator
An internal part of the Array Controller that dramatically improves performance of disk read and write operations by providing a buffer. A battery backup and ECC memory protects the data.

Capacity Expansion
A feature that allows an increase in storage capacity of a drive array with the addition of one or more physical drives to the array. With the added space on the array, one or more new logical drives can be created. This feature is available only on Array Controllers that support expansion.

Controller Duplexing/Disk Duplexing
Disk duplexing is a variation of disk mirroring in which each of multiple storage disks has its own SCSI controller. Disk mirroring (also known as RAID-1) is the practice of duplicating data in separate volumes on two hard disks to make storage more fault-tolerant. Mirroring provides data protection in the case of disk failure, because data is constantly updated to both disks. However, since the separate disks rely upon a common controller, access to both copies of data is threatened if the controller fails. Disk duplexing overcomes this problem; the use of redundant controllers enables continued data access as long as one of the controllers continues to function.

Expand Priority
After choosing to expand an array, the level of priority that expanding array capacity should have over handling current operating system requests.

Maximum Boot Size
Max Boot or Maximum Boot Size determines the number of sectors used for the logical drive. When Max Boot is disabled, the logical drive is created with 32 sectors per track. In this configuration, the largest boot drive which can be created is 4 GB. With Max Boot enabled, the controller creates the logical drive with 63 sectors per track which will allow you to create a boot drive which is up to 8 GB in size. We suggest only enabling Max Boot on the drive from which you will boot your server, as a slight performance gain is seen using 32 sectors per track.
The maximum boot size option is initially disabled. Disabling maximum boot size means that the logical drive will report the default of 32 sectors per track to BIOS calls (int13h). Enabling the maximum boot size increases the number of sectors reported in BIOS calls to the maximum of 63 in order to increase the number of blocks available. Enabling maximum boot size may be necessary to create large boot partitions for some operating systems. For example, enabling maximum boot size on a logical drive in Windows NT 4.0 allows you to create a bootable partition with a maximum size of 8 GB, instead of the 4 GB maximum size allowed when maximum boot size is disabled. When a logical drive larger than 255 GB is created, a sector size of 63 will be reported to BIOS calls regardless of whether or not the maximum boot size was enabled.

Migration
A feature that allows you to change the fault tolerance level or stripe size of a configured logical drive without incurring any data loss.

Online Recovery Server
An Array Controller that has been set by the System Configuration Utility to Online Recovery Server mode is a controller that has the ability to dynamically move storage devices from a failed server to an active server. In effect, the storage devices are hot plugged from one system and hot plugged into the new one.

Online Spare
A physical drive used in RAID 1+0 - Drive Mirroring, RAID 4 - Data Guarding, RAID 5 - Distributed Data Guarding, and RAID 6 (ADG) - Advanced Data Guarding to provide drive replacement for a failed drive without user intervention. The spare immediately replaces a failed drive as soon as the failure occurs. The controller automatically begins rebuilding the data from the failed drive on the spare to return to a fault tolerant state. The failed drive can be replaced while the system is operating at top performance. The drawback is that the drive is not used while inactive and this reduces the amount of usable storage capacity.

RAID Overhead
A pre-defined space set aside for RAID redundant information on a logical drive.

Rebuild Priority
After a failed drive has been replaced, the level of priority that rebuilding the data from the failed drive should have over handling current requests from the operating system.

Redundant Controllers
A pair of controllers that have been installed into a system and share a single storage system. The controllers are interconnected either via an Inter-Controller Link (ICL) for 64-Bit or Extended PCI Controllers or internally for Fibre Channel Controllers.
The primary controller of the pair handles all communications and control of the storage system and its attached drives. If the primary controller is no longer able to issue read or write commands to these drives, the secondary controller assumes control.

SCSI
SCSI stands for Small Computer Systems Interface.

SCSI ID
A unique ID assigned to each SCSI device connected to the same SCSI channel. The ID number uniquely defines each peripheral device address and determines the device priority on the bus. ID 7 (SCSI controller) is the highest priority; ID 0 is the lowest.

Selective Storage Presentation
Selective Storage Presentation allows logical drives on an array controller to be shared by multiple servers. A server connects to the array controller by using a host controller that is installed in the server. Selective Storage Presentation allows users to name connections from host controllers to array controllers, and it allows users to grant or deny access to connections for each logical drive. Selective Storage Presentation is currently only supported for fibre channel controllers.

Stripe Size
A stripe is a collection of contiguous data that is distributed evenly across all physical drives in a logical drive. The size of the stripe is selected to optimize the performance of the operating system. Stripe size is synonymous with distribution factor.

Surface Scan Analysis
Surface Scan Analysis is a background process that scans hard drives for bad sectors in fault tolerant logical drives. In RAID 5 or RAID 6 (ADG) configurations, Surface Scan also verifies the consistency of parity data. This process assures that you can recover all data successfully if a drive failure occurs in the future.

Useable Capacity
Space on the array available to the user for logical drives.

Monday, July 27, 2009

Quality of life by Threshold and Remote Control

I was at a party on Saturday night and while talking to a friend who is also in IT I realized that threshold is a very important part of our lives. This friend is part of a 4 guys weekly on call duty and he just finished his round. He is exhausted! His company uses very low thresholds for alerts on any subject, which make his life miserable and got me to appreciate the fact I am the one in charge for this decision on my network…

One of the most important aspects of a OneManITShop life is his (or her, don’t start with me on this) ability to have some personal life out of the office (some will just settle for being out of the office…)
You already finished for the day and ready to leave but WHAT IF?
This little IF can make the difference between spending every minute scary and close to your subway pass and having a relaxed evening or weekend anywhere and comfortably.

One side of this issue is the Threshold:
How to determine the correct mark for alerts?
What can wait for tomorrow or must be taken care of right now?

Rule of thumb
“If it doesn’t affect production it can wait”
and
“If users do not notice it they can’t complain about it”

Whenever I install a new service I pause at the notification configuration stage and try to determine what are the implications on productivity and recovery capabilities.
I try to simulate the different scenarios, also Goggling for similar issues and see how it affected other companies.
If it is a problem that will cause damage (like hardware getting too hot) it is a right here right now situation and alerts should go crazy on me
When it affect users, even if the recovery can be easily applied next morning, I’ll set the alerts to notify me and make sure I connect remotely and fix the problem. If there is one thing I hate is walking into the office and have users hunting me. I better get the alert and figure out the solution the night before.

The other side of this is my ability to receive the alerts and connect to the network from any location.
So I have my home PC and a laptop, both connect to the network via Cisco VPN Client and get anywhere in the network. I use 3 tools for remote connection as I found over the years that having just one always end up badly.
First and most commonly used is Symantec’s PCAnywhere. I’m not crazy about it but when it’s working it is fast to connect and doing the job. Do not rely on PCAnywhere solely – it has high tendency for faulting, sessions die and hosts break very often.
As a backup remote control tool I always keep a copy of DameWare. It has amazing remote control capabilities and the huge plus is the remote installation so you do not have to pre-configured anything on any server or PC. Just push it and connect. It can also pull details like running services & processes, events and hardware data. Make your life easier.

These are great solutions and with RDP they cover almost every option and allow full control but one case is missing. What if you’re at the movies or walking in Manhattan and do not have your laptop?

My iPhone 3G have the answer for this situation. I connect to the network via the built in Cisco VPN client and can access anything as if I’m on a laptop. RDP app can do exactly the same job as would any RDP session from a laptop. Amazing!
Another iPhone app saved me when I had to make a change to my Cisco ASA, using SSH I could do it all off the phone and save myself the trip to the office.

Friday, July 24, 2009

Warriors of the Net

Back in the old days when I made my first steps in this field someone showed me this amazing simple yet descriptive film. I just found out they recently celebrate 10 years, why don't we all enjoy it?!
Part I

Part II

IM– do we have a choice?

IM is part of our life, part of our business.
Like any other technology, it can make us more efficient or less productive. It’s all up to how we use it and what we’re using it for.
My company has branches around the globe. For us IM is a money saver and a bridge over time zones and languages. Many people find it easier to communicate via IM where your accent is not an obstacle and so is your co-worked whose accent you can’t understand. And who’s the native English speaker is not the issue here.
The other side of the same story is all the friends and relatives that contact you once you’re logged in. yes, it is nice to chat with mom while working and your wife just want to ask something but not only you spend the time, it is taking your focus off real work. There is one more aspect, even worse: co-workers using IM to chat so their cube neighbors won’t hear them on the phone all day. This way 2 employees are not working…

While those issues are important, they are not for us to worry about. These are management policies and the way of life is that most organizations, specially the small to medium where we, OneManITShop guys work, will keep IM open for all users.

So what is our take on this?

We control the installation which let us have a say about the type of programs we allow.
One example would be AIM users. There are few versions for AOL. AIM stream ads and has the potential to cause problems. AIMPro is much better and it’s more stable. http://www.trillian-messenger.net/gives some extra features. And there are more options.
Microsoft’s MSN messenger has the same issues where old versions do not stream ads but also block many features.
Another aspect is regulation. Many companies must monitor and archive all chats and that requires a 3rd party program that support the IM clients we use. One more headache… more on this aspect, make sure you block web access to IM clients if your 3rd party tool doesn’t capture it.

So what should we use?
My policy is, especially when users run more then one IM account is to use multi-vendor clients. Personally I use Pidgin and it’s awesome. we also use POD - a MessageLabs tool that also capture and archive (for some $$$)
You get all major IM accounts in one place, one window with tabs and most important – no ads. Surprisingly is has a smaller footprint on the desktop which add to it’s clear advantage.
I always keep the latest updates on my file server but make sure my firewall block those updates that each client initiate – it consume bandwidth and make users update on their own, which usually make me work harder.
One more advantage is achieved when the compliance officer check his monitoring systems - he can see one program per user, not 2 or 3 which make it easier and faster and when his work is more efficient it’s an extra benefit for the company.

Thursday, July 23, 2009

Paging vs RAM

My network get hammered with anywhere between 50,000 to 125,000 SMTP connections per hour (on a 50 mailbox network). Though more then 99% of all connections drop early using email reputation services, my server work hard around the clock and from time to time require some personal attention.
To help the poor guy (I’m really attached to this server as we spend many hours together) Monitoring the performance over few weeks I decided to boost its memory and help it reduce the processing time during peaks.
The DL320 G5 server already had 2GB of RAM and the page file set to the max limitation of 4095MB. Should I add RAM or overcome the limit?

In order to get a decision we have to start with the basic question: What is a page file?
In simple words (you can also read here) paging is a way in which the OS can store data and retrieve it in addition to the main memory component.
32-bit Windows 2003 servers have a 4095MB paging limit. let me explain:
In a 32-bit computer, the memory addresses are 32 bits long and stored as binary numbers. There are approximately 4 billion possible different 32-bit binary numbers (2^32) which represent 4GB.
While RAM work faster and has a low cost (4GB for a DL320 is sold around $60) it has a solid limit of 4GB while paging limitation can be tweaked.

Most applications will eat as much memory as you let them where 4GB again is set to be the limit per process. This server only run one application therefore I decided to go with 2 steps and let it work for couple of weeks, monitor again and check the performance change:
First and easy I got an extra memory and upgraded my RAM to 4GB.
As expected I saw an immediate boost in performance but let’s not jump to conclusions and wait,. I want to see how the application behave over time.
The second step was defragging the page file. No, not the standard disk defrag you might run once in a while – that would not affect the page file. I used PageDefrag – a Sysinternals (now part of Microsoft) tool that does just this one task. If you don’t know it by now you should get familiar with it since it is a great tool.

The reason I try to avoid changing the page file size limit is simple: I like to keep it simple and original as possible. While adding RAM keep with the intended specs for the server changing the size limit would be a step toward more problems. You can never know how a regular out of the box program that was built for a 4GB limit would behave with additional memory. Most programs will have no problems but once in a while you’ll start seeing weird behavior and we all know that a familiar problem even if it's repeating and annoying is better then any new problem...

Wednesday, July 22, 2009

What is it all about?

I'm working as a system administrator for 8 years and covered many topics.
My day to day life is very interesting but sometimes lonely, I do not have a team mate to share my problems, frustrations and victories.
Our Wall St. company managed to survive the financial crisis pretty well but at the same time the money we spend on technology (or anything else) is monitored closer then ever and as the IT Manager (yep, that is the official title) I'm expected to find creative solutions that save money. you might find them interesting and even useful.
I'll use this platform to share my thoughts, problems and solutions, cool tools and gadgets that I use (and my wife doesn't seem to care about) and help other people with their life.
I hope this will become a place for all the lonely admins out there and obviously all team members are more then welcome to read, comment and share their life.

One Man IT Shop