Dave Heinemann

.NET Value Types and Reference Types Explained

If you're using C# or a similar programming language, you may have heard the terms "value type" and "reference type". What are they? How are they different from one-another? Let's find out.

This article is written from a C# perspective, but it applies to all .NET languages (Visual Basic, PowerShell, etc.), and likely also similar languages such as Java.

Defining Value Types and Reference Types

In C# and other .NET languages, data types fall into two categories: value types, and reference types. Value types include the built-in numeric types (int, long, double, etc), and structs such as DateTime. Reference types include interfaces, and objects such as string, List<T>, and arrays.

Value types and reference types are defined by how their variables work. A value type variable always contains the value itself. If I set an integer variable to 5, then it contains the value 5. On the other hand, a reference type doesn't contain the value. Instead, it contains a reference to it—a pointer to its address in memory. If I create a variable for a string, that variable actually contains an integer representing the string's memory address.

For the most part, C# and similar languages will automatically handle references so that you don't need to deal with them directly. However, it's important to understand how value types and reference types behave, because misunderstanding the nuances can lead to unexpected bugs.

Value Types

As explained, variables of value types always contain the value itself. If I create an integer variable and set it to 5, then it actually contains the number 5.

int foo = 5;

If I copy this variable into a separate variable, the value itself is copied. I can then change the copied variable without affecting the original:

int foo = 5;
int bar = foo;
bar = 10;
Console.WriteLine(foo); // prints 5
Console.WriteLine(bar); // prints 10

Similarly, if I pass this variable to a function, then its value gets copied to that function. The function can modify its copy of the value without affecting the original:

public static void ChangeTo10(int n)
{
    n = 10;
}

public static void main()
{
    int foo = 5;
    ChangeTo10(foo);
    Console.WriteLine(foo); // prints 5
}

This behaviour is known as "passing by value"—copying a value type variable copies the value itself.

Reference Types

Reference types work differently. As explained, variables of reference types contain a reference to the value—not the value itself. Let's create a new class named Person to demonstrate:

public class Person
{
    public string Name { get; set; }

    public Person(string name)
    {
        Name = name;
    }
}

If I define a variable and initialize it to a new instance of the Person class, then this variable contains a reference to the object's location in memory:

public static void main()
{
    // person contains a reference to the object, not the object itself.
    var person = new Person("John");
}

If I then copy this variable to a second variable, the reference gets copied, but the value does not. This means that both variables now point to the exact same object in memory. This can be tested using Object.ReferenceEquals():

public static void main()
{
    var person1 = new Person("John");
    var person2 = person1;
    Console.WriteLine(object.ReferenceEquals(person1, person2)); // prints True
}

What does this mean? It means that if two variables point to the same object, modifying one will affect the other:

public static void main()
{
    var person1 = new Person("John");
    var person2 = person1;
    person2.Name = "Jane");
    Console.WriteLine(person1.Name); // prints "Jane"
}

This also applies when passing a variable as a function argument:

public static void ChangeToJane(Person p)
{
    p.Name = "Jane";
}

public static void main()
{
    var person1 = new Person("John");
    ChangeToJane(person1);
    Console.WriteLine(person1.Name); // prints "Jane"
}

This behaviour is known as "passing by reference"—copying a reference type variable copies the reference, not the value itself.

ByRef, ref, and [ref]

But if reference types are always passed by reference, then what are the ByRef (VB.NET), ref (C#), and [ref] (PowerShell) keywords for?

These keywords allow you to pass a reference type to a function in such a way that a reference to the original variable is passed rather than the reference to the object. This allows the function to change which object the original variable points to in memory:

public static void ChangeToJaneByReference(ref Person p) // note the ref keyword
{
    p = new Person("Jane");
}

public static void main()
{
    var person1 = new Person("John");
    // person1 points to the Person John

    ChangeToJaneByReference(ref person1);
    // person1 now points to the Person Jane

    Console.WriteLine(person1.Name); // prints "Jane"
}

This is impossible without the ref keyword:

public static void ChangeToJane(Person p)
{
    p = new Person("Jane");
}

public static void main()
{
    var person1 = new Person("John");
    // person1 points to the Person John

    ChangeToJane(person1);
    // person1 still points to the Person John because ChangeToJane() cannot
    // change what person1 points to without using the ref keyword

    Console.WriteLine(person1.Name); // prints "John"
}

What about value type parameters? ref works for them too! It allows a function to modify the value contained in a variable passed to the function in the same way that ref works for reference types:

public static void ChangeTo10ByReference(ref int n) // note the ref keyword
{
    // Since n represents the memory address of whatever variable was passed
    // in, modifying it changes the value of that variable.
    n = 10;
}

public static void main()
{
    int foo = 5;
    ChangeTo10ByReference(ref foo);
    Console.WriteLine(foo); // prints 10
}

Enabling an original variable to be modified in this way is the only purpose of the ref keyword. If your method uses the ref keyword but doesn't change the value or reference of a variable passed to it, then the ref keyword serves no purpose.

Additionally, using ref should generally be avoided unless there's a really good reason why a function cannot simply return a modified object using a return statement. Using the return statement is simpler and easier to understand.

Why do Value Types and Reference Types Work this Way?

Hopefully that clears up the differences between value types and reference types, but you may be left wondering why things work this way. Wouldn't it be simpler if all values could simply be passed by value? Maybe. But it isn't practical.

While value types are generally small in size, objects can be massive. If you passed a string representing a very large text file to a function, you probably wouldn't want the contents of that string to be duplicated in-memory. If you didn't specifically need the string to be copied, then it would be both a waste of memory and CPU cycles. And the problem would be even worse if that function then passed the string to another function, and so-on… It's far better for objects to be passed by reference by default. If a programmer has a specific need to duplicate a string or other object, they can do it themselves.

Tying it All Together

How does this affect your code? It means that value types and reference types need to be handled in fundamentally different ways. For example:

Do you have any thoughts or feedback? Let me know via email!

#.NET #C# #Programming