December 22, 2013

Exploring Strings

Now that we have basic idea about string objects and arrays, we can safely explore string methods. One of the core C# programming quality is to be able to manipulate strings. And to do that, you have to know different built-in .NET methods and their usage.

In this post, we shall also see some of the un-managed bugs that are frequently found around string codes. We shall also see how the same code can behave differently when certain conditions are changed.

Before we start, open this link in a different tab.

Compare two strings: To compare two strings, you can use the Compare method. On the given link, you will find that there are around 8-10 overloaded Compare methods. If you are not familiar with method overloading, go back and see the post on classes and objects. The Compare method returns integer value.
If the value is equal to zero, then the two strings are equal. If the value is less than zero, the first string is less than the second. The opposite happens when the value is greater than zero.

However, we are more interested about which overloaded method to use and why are there so many overloaded methods. It apparently seems lie all of them perform in the same way. But you can twist them based on your necessities. Below is a demonstration of some of the overloaded methods and their purposes.

using System;

namespace Spells
{
    class Program
    {
        static void Main(string[] args)
        {
            string str1 = "thoughts";
            string str2 = "thoughts";
            if (string.Compare(str1, str2) == 0)  // 1
                Console.WriteLine("1. Equal");
            else
                Console.WriteLine("1. Not equal");
            str2 = "tHoughTs";                      // change the cases
            if (string.Compare(str1, str2) == 0)    // 2
                Console.WriteLine("2. Equal");
            else
                Console.WriteLine("2. Not equal");
            if (string.Compare(str1, str2, true) == 0)  // 3
                Console.WriteLine("3. Equal");
            else
                Console.WriteLine("3. Not equal");
            str1 = "æ";
            str2 = "ae";
            if (string.Compare(str1, str2, StringComparison.InvariantCulture) == 0)        // 4
                Console.WriteLine("4. Equal");
            else
                Console.WriteLine("4. Not equal");
            if (string.Compare(str1, str2, StringComparison.Ordinal) == 0)        // 5
                Console.WriteLine("5. Equal");
            else
                Console.WriteLine("5. Not equal");
            Console.Read();
        }
    }
}
Notice that line 11 and 16 are exactly the same method. I have changed one of the strings to see the output. If you run the code, you will also notice that the outputs are different, which is quite evident. Line 20 however uses a different overloaded method. The third parameter is a boolean parameter specifying if you want to make a case-sensitive or insensitive comparison. The default (without third parameter) is a case-sensitive comparison. However, the best practice is to specify it in the code, in that way, when others read your code, its clear to them what you are actually trying to do!

I have changed string 1 and 2 on line 24 and 25 to something else. Its a phonetic symbol. I made up my mind to test what StringComparison enumerator does as a third parameter. Line 26 and 30 will surprise you, because if you see the output, the 4th comparison will show "Equal" while the 5th will show "Not equal". The Ordinal simply makes a direct comparison on Unicode characters. If your goal is to check whether there is anything different between the two strings, go with Ordinal. On the otherhand, InvariantCulture knows about special rules on phonetics and languages.
Recommended: Use this method when you are trying to compare strings for sorting purpose. Comparison 3 and 5 (in the code snippet line 20 and 30 respectively) are recommended while 1 and 2 are highly discouraged.

Equality: Comparing two strings and checking for equality are two different things and should not be confused. When you compare strings, your interpretation is to find out which among the two should be higher in ordered sequence. But in equality check, you only check whether the two string objects are equal or not.
The following code snippet is a simple implementation of Equal method. Though both the interpretation results the same output, the 2nd way is nice, neat and helpful.

string str1 = "thoughts";
string str2 = "thoughts";
if (str1.Equals(str2))
    Console.WriteLine("Equal");
else
    Console.WriteLine("Not equal");
if(string.Equals(str1,str2))
    Console.WriteLine("Equal");
else
    Console.WriteLine("Not equal");
That the 2nd way is helpful, we shall see that in a couple of examples.

using System;

namespace Spells
{
    class Program
    {
        static void Main(string[] args)
        {
            string str1 = "thoughts";
            string str2 = "thoughts";
            if(string.Equals(str1,str2))      // 1
                Console.WriteLine("1. Equal");
            else
                Console.WriteLine("1. Not equal");
            str2 = "ThouGhts";
            if (string.Equals(str1, str2, StringComparison.OrdinalIgnoreCase))      // 2
                Console.WriteLine("2. Equal");
            else
                Console.WriteLine("2. Not equal");
            if (string.Equals(str1, str2, StringComparison.Ordinal))      // 3
                Console.WriteLine("3. Equal");
            else
                Console.WriteLine("3. Not equal");
            Console.Read();
        }
    }
}
Recommended: Line 16 and line 20 are recommended. They make your code distinct and precise. They will make an unambiguous interpretation to whoever watching your code whether you are ignoring cases or not.

The Culture Issue: At this point you might be wandering what this culture means when you look into StringComparison enumeration. The culture essentially means the regional language. Go to your control panel -> clock, language and region -> region -> location tab. The default is United States, unless you change it by yourself. This will be the current culture in your pc. Now when you create a program, then the default string operations are done based on current culture, unless you state which culture you want explicitly. You may like to read further on these issues directly from MSDN library. This is a code that I have directly used from this link.

using System;
using System.Globalization;
using System.Threading;

namespace Spells
{
    class Program
    {
        public static void Main(string[] args)
        {
            string[] values = { "able", "ångström", "apple", "Æble", 
                         "Windows", "Visual Studio", "Thoughts and Spells" };
            Array.Sort(values);
            DisplayArray(values);

            string originalCulture = CultureInfo.CurrentCulture.Name; // save the current culture
            // Change culture to Swedish (Sweden).
            Thread.CurrentThread.CurrentCulture = new CultureInfo("sv-SE");
            Array.Sort(values);
            DisplayArray(values);

            // Restore the original culture.
            Thread.CurrentThread.CurrentCulture = new CultureInfo(originalCulture);

            Console.Read();
        }

        private static void DisplayArray(string[] values)
        {
            Console.WriteLine("Sorting using the {0} culture:",
                              CultureInfo.CurrentCulture.Name);
            foreach (string value in values)
                Console.WriteLine("   {0}", value);

            Console.WriteLine();
        }
    }
}
This is the output console.

Here is a link of all the cultures [All cultures]. I hope you understand why the same code can perform differently when the current culture changes. So it is advisable and recommended that when linguistics and localization of your application do matter, the choice of default culture and explicitly specifying the culture matters.

Searching a character or a string: Searching for a character or an entire string inside a string is a regular string handling task. Below is a code snippet that shows you three different methods that you can use while searching for a string instance.

using System;

namespace Spells
{
    class Program
    {
        public static void Main(string[] args)
        {
            string str = "This is a post about strings. It will help you to learn more about strings.";

            Console.WriteLine("Character 'a' at {0}", str.IndexOf('a'));
            Console.WriteLine("String 'is' at {0}", str.IndexOf("is"));
            Console.WriteLine("String 'is' at {0}", str.IndexOf("is", 4));
            Console.WriteLine("String 'to' at {0}", str.IndexOf("to", StringComparison.CurrentCulture));
            Console.WriteLine("String 'is' at {0}", str.IndexOf("is", 4, StringComparison.Ordinal));
            Console.WriteLine("String 'String' at {0}",str.IndexOf("String",StringComparison.OrdinalIgnoreCase));

            Console.WriteLine("Last occurence of 'String' at {0}", str.LastIndexOf("String", StringComparison.OrdinalIgnoreCase));
            Console.WriteLine("Last occurence of 'String' at {0}", str.LastIndexOf("String", 60, StringComparison.OrdinalIgnoreCase));

            Console.WriteLine("Starts with the word 'this'? {0}", str.StartsWith("this"));
            Console.WriteLine("Starts with the word 'this'? {0}", str.StartsWith("this", StringComparison.OrdinalIgnoreCase));

            Console.Read();
        }
    }
}

The IndexOf method returns the zero-based index of the first occurrence of a character or a string. What is the difference between line 11 and 12? At line 11, we search for a character only. The default interpretation for IndexOf method is StringComparison.CurrentCulture. However, note carefully that the overloaded method for searcheing a character does not allow you to specify StringComparison explicitly. You are allowed to specify StringComparison in case of string searches only, as in line 14 to 16. MSDN recommends that it is best to use the overloaded methods that allows explicit declaration of StringComparison. This ensures code clarity and ease of debug.
The LastIndexOf method at line 18 and 19 is similar to IndexOf method. It returns zero-based index of the last occurrence of a string or a character.
The StartsWith method helps you to search if the given string has any particular starting string. Among line 21 and 22, the one with StringComparison parameter is recommended. This clears the confusion to other programmers if you are ignoring or considering cases.

Splitting a string: Often you will find yourself in a situation where you would want your program to consider each word individually. Splitting comes to the rescue. Fortunately the Split method returns an array of string after splitting the string.

using System;

namespace Spells
{
    class Program
    {
        public static void Main(string[] args)
        {
            string str = "Learn C# @ www.thoughts-n-spells.blogspot.com.";
            string[] broken = str.Split(' ');             // 1
            Display(broken);

            char[] splitters = { ' ', '@', '.' };
            string[] broken2 = str.Split(splitters);      // 2 
            Display(broken2);

            string[] broken3 = str.Split(splitters,StringSplitOptions.None);      // 3 
            Display(broken3);

            string[] broken4 = str.Split(splitters, StringSplitOptions.RemoveEmptyEntries);      // 4 
            Display(broken4);

            Console.Read();
        }
        private static void Display(string[] toDisplay)
        {
            foreach (string item in toDisplay)
            {
                Console.WriteLine("{0}", item);
            }
            Console.WriteLine("============");
        }
    }
}
The first parameter of Split is either a character array or string array specifying what to remove or split upon. In this case, I have used character array considering single character as delimiter to split my string. If you have a single character to use as delimiter, use the one at line 10. If you have multiple characters to shove off, use it like line 13 and 14.
If two delimiters are adjacent, or a delimiter is found at the beginning or end of this instance, the corresponding array element contains Empty.
The Empty string means a string with length 0. Go ahead and look at the output for line 14 and 17.
Look at the red marks on the console image. I have marked the places where we have found empty strings as output. If both the methods at line 14 and 17 output the same thing, then which one is recommended? Obviously line 17. In that way, you let yourselves and others know that you are allowing empty strings as output! If you want to eliminate empty strings from the return array, go ahead and look at line 20. The StringSplitOptions.RemoveEmptyEntries allows you to remove empty strings. As you can see, the last output contains a returned string array that has no empty strings.

The Regex.Split method: Often you would find the necessity to find patterns and split based on patterns. The Regex.Split method comes handy in these tasks. Functionalities are almost similar to the Split method, except the fact that Regex.Split is flexible and allows you more freedom to play with strings.

using System;
using System.Text.RegularExpressions;

namespace Spells
{
    class Program
    {
        public static void Main(string[] args)
        {
            /* remove all the spaces */
            string str = "7 + 4 - 2 = 3 * 3";
            string[] separated = Regex.Split(str, @"\s+");
            Display(separated);
            /* remove non-digit characters */
            str = "10 tablets, 25 handsets, 33 laptops";
            string[] numbers = Regex.Split(str, @"\D+");
            Display(numbers);
            /* remove all the lowercase alphabets */
            str = "Awer145rwrKcx777sdFew45xcvdfg111";
            string pattern = @"[a-z]+";
            string[] magic = Regex.Split(str, pattern);
            Display(magic);
            /* remove all aplphabets ignoring cases */
            string[] moreMagic = Regex.Split(str, pattern, RegexOptions.IgnoreCase);
            Display(moreMagic);

            Console.Read();
        }
        private static void Display(string[] toDisplay)
        {
            foreach (string item in toDisplay)
            {
                Console.WriteLine("{0}", item);
            }
            Console.WriteLine("============");
        }
    }
}
Go ahead and hit F5 and see the output.

Leading or trailing spaces bothering you? .NET provides 3 classy methods to remove leading and trailing spaces.

using System;

namespace Spells
{
    class Program
    {
        public static void Main(string[] args)
        {
            string str = "   Spaces at front and spaces at the back.    ";
            Console.WriteLine("<>{0}<>", str);
            Console.WriteLine("<>{0}<>", str.TrimStart());
            Console.WriteLine("<>{0}<>", str.TrimEnd());
            Console.WriteLine("<>{0}<>", str.Trim());

            Console.Read();
        }
    }
}

The Trim methods are useful when you receive data over a network or when you read files. In that way, you can remove unnecessary leading or trailing spaces. 

I have tried to show some of the fundamental string methods and their usages. I have also tried to show overloaded methods that can leave certain traces of bugs, whose interpretations could be different under different contexts. Lastly I have also tried to show some recommended practices which can improve code clarity and unambiguity.

No comments:

Post a Comment