If you're using C# or a similar programming language, you may have heard the terms "value type" and "reference type". What are they? How are they different from one-another? Let's find out.
This article is written from a C# perspective, but it applies to all .NET languages (Visual Basic, PowerShell, etc.), and likely also similar languages such as Java.
Defining Value Types and Reference Types
In C# and other .NET languages, data types fall into two categories: value types, and reference types. Value types
include the built-in numeric types (int,
long, double, etc), and structs such as
DateTime. Reference types
and objects such as
Value types and reference types are defined by how their variables work. A value type variable always contains the value itself. If I set an integer variable to 5, then it contains the value 5. On the other hand, a reference type doesn't contain the value. Instead, it contains a reference to it—a pointer to its address in memory. If I create a variable for a string, that variable actually contains an integer representing the string's memory address.
For the most part, C# and similar languages will automatically handle references so that you don't need to deal with them directly. However, it's important to understand how value types and reference types behave, because misunderstanding the nuances can lead to unexpected bugs.
As explained, variables of value types always contain the value itself. If I create an integer variable and set it to 5, then it actually contains the number 5.
If I copy this variable into a separate variable, the value itself is copied. I can then change the copied variable without affecting the original:
Similarly, if I pass this variable to a function, then its value gets copied to that function. The function can modify its copy of the value without affecting the original:
This behaviour is known as "passing by value"—copying a value type variable copies the value itself.
Reference types work differently. As explained, variables of reference types contain a reference to the
value—not the value itself. Let's create a new class named
Person to demonstrate:
If I define a variable and initialize it to a new instance of the
Person class, then this variable
contains a reference to the object's location in memory:
If I then copy this variable to a second variable, the reference gets copied, but the value does not. This means
that both variables now point to the exact same object in memory. This can be tested using
What does this mean? It means that if two variables point to the same object, modifying one will affect the other:
This also applies when passing a variable as a function argument:
This behaviour is known as "passing by reference"—copying a reference type variable copies the reference, not the value itself.
ByRef, ref, and [ref]
These keywords allow you to pass a reference type to a function in such a way that a reference to the original variable is passed rather than the reference to the object. This allows the function to change which object the original variable points to in memory:
This is impossible without the
What about value type parameters?
ref works for them too! It allows a function to modify the value
contained in a variable passed to the function in the same way that
ref works for reference types:
Enabling an original variable to be modified in this way is the only purpose of the
ref keyword. If
your method uses the
ref keyword but doesn't change the value or reference of a variable passed to it,
ref keyword serves no purpose.
ref should generally be avoided unless there's a really good reason why a function
cannot simply return a modified object using a
return statement. Using the
statement is simpler and easier to understand.
Why do Value Types and Reference Types Work this Way?
Hopefully that clears up the differences between value types and reference types, but you may be left wondering why things work this way. Wouldn't it be simpler if all values could simply be passed by value? Maybe. But it isn't practical.
While value types are generally small in size, objects can be massive. If you passed a string representing a very large text file to a function, you probably wouldn't want the contents of that string to be duplicated in-memory. If you didn't specifically need the string to be copied, then it would be both a waste of memory and CPU cycles. And the problem would be even worse if that function then passed the string to another function, and so-on… It's far better for objects to be passed by reference by default. If a programmer has a specific need to duplicate a string or other object, they can do it themselves.
Tying it All Together
How does this affect your code? It means that value types and reference types need to be handled in fundamentally different ways. For example:
If you want to pass a value type to a function so that it can be modified, that function needs to return the
modified value (or it could use the
refkeyword, but this should be avoided).
- Be aware that if you write a function that modifies a reference type parameter, it will modify the original object that the calling function passed in.
- If your function modifies a reference type parameter, then it isn't necessary for it to also return that parameter (however, this can still be useful in some cases, such as when building a fluent interface).
If your function doesn't change a parameter's original variable to a different value or object, then it
doesn't need to use the