Monday, March 14, 2011

KnowDotNet: String.Intern() and String.IsInterned()

Interning is a process where the compiler creates a pool of string references to literal strings in an application. You can use this feature to prevent re-allocation of the same string in memory. An interned string will stay in memory until the CLR has shutdown so this may be something to consider if memory usage is an issue/priority.

using System;
using System.Text;

namespace String_Intern_Test
{
    class Program
    {
        static void Main( string [] args ) {
            // Compile Time Interning
            var myInternedString = "This string is interned at compile time";
            Console.WriteLine( String.IsInterned( myInternedString ) 
                == null ? "NO" : "YES" ); // shows YES

            // Runtime Interning
            // IMPORTANT: Don't use StringBuilder to concatenate strings like this
            // I've used StringBuilder to avoid creating a string literal which would
            // be automatically interned.
            StringBuilder myStringBuilder = new StringBuilder();
            myStringBuilder.Append( "This is going to be" );
            myStringBuilder.Append( " interned soon!" );

            var myString = myStringBuilder.ToString();

            Console.WriteLine( String.IsInterned( myString ) 
                == null ? "NO": "YES" );  // shows NO
            String.Intern( myString );
            Console.WriteLine( String.IsInterned( myString ) 
                == null ? "NO" : "YES" );  // shows YES

        }
    }
}


In line 10 we create a string. The actual value of the string, ie. "This string is interned at compile time", is assigned for interning at compile time. The Console.WriteLine() will return YES meaning that the string was interned. Note a few things about line 11, 24 and 27: The strings are cast to objects so that the == operator tests equality of the references whereas strings redefine the behaviour of the == operator to do a comparison of the CONTENTS of the two strings.

From line 15 we have the start of a test where we programmatically intern a string. We use StringBuilder to construct a string to avoid the creation of a literal (or something that the compiler can figure out will be one whole string literal)

myString is eventually created from the StringBuilder contents... but note that this string is NOT interned (thus on line 24 you get a result of "NO". We then intern using String.Intern() and test again, this time we get YES on line 28.

String.Intern() will return the string if it is interned, if it doesn't exist it will intern the string and return the new interned string reference.

String.IsInterned() will return the string reference if it is interned, or null if it is not interned.

Now, when you have two string literals the compiler will intern the string and anywhere else you use that same string literal will refer, at runtime, to the same reference (or both strings will reference the same object.) To show this here is the above with a few alterations.

using System;
using System.Text;

namespace String_Intern_Test
{
    class Program
    {
        static void Main( string [] args ) {
            // Compile Time Interning
            var myInternedString = "This string is interned at compile time";
            Console.WriteLine( String.IsInterned( myInternedString )
                == null ? "NO" : "YES" );

            Test( myInternedString );
        }

        static void Test(string otherString) {
            var myInternedString2 = "This string is interned at compile time";

            Console.WriteLine((object) otherString 
                == (object) myInternedString2);
        }
    }
}
In this example line 20 will return True. otherString is actually the myInternedString string from the main() method. myInternedString is interned at compile time, and myInternedString2 will reference the same string in memory due to interning. Inside the Test() method we are comparing the references of these two strings and, as they are both exactly the same string literals, the compiler will intern the string and anywhere that string exists in the code will be loaded from the pool of strings. In other words anywhere the string literal exists will result in the same string reference being used.

No comments:

Post a Comment