Photo by Rubaitul Azad on Unsplash
Working with MongoDB in .NET (Part 2): Retrieving Documents with Filter Clause
In the previous part we went through some of the driver basics and how to insert documents to a collection. In this part of the series, we'll learn how to retrieve documents from the database.
Any document belongs to a collection, therefore all CRUD operations have the scope of a single collection. To retrieve documents from a collection, we can use the Find
, FindSync
, and FindAsync
functions.
FindSync & FindAsync
FindSync
and FindAsync
both have two overloads with three parameters. Both FindSync
and FindAsync
are somewhat similar except that FindSync
is synchronous and blocks until it's call is complete. FindSync
returns an IAsyncCursor
while FindAsync
returns a task of IAsyncCursor
.
What is IAsyncCursor
MongoDB returns a query result in batches, and the batch size will not exceed the maximum size of a BSON document. As of version 3.2, the maximum size of a BSON document is 16 megabytes. The maximum document size helps ensure that a single document cannot use excessive amount of RAM or, during transmission, excessive amount of bandwidth. This constraint also applies when adding documents to a collection, but in order to store larger documents, MongoDB has made GridFS API as a provision. For most queries, the first batch returns 101 documents or just enough document to exceed 1 megabyte, and subsequent batches will be 4MB. It's possible to override the default batch size, and we can do this from the driver by setting BatchSize
property of the FindOptions
, which is passed in as a second parameter to any of the find methods. So basically, a cursor is a pointer to the result set of a query.
By default, the server will automatically close the cursor after 10 minutes of inactivity or if the client has exhausted the cursor. To override this behavior, you can specify the noTimeout
flag in your query using NoCursorTimeout
property of the FindOptions
class. However, you should either close the cursor manually or exhaust the cursor.
The IAsyncCursor
from the driver represents an asynchronous cursor. To access the documents, we need to manually iterate the cursor.
Retrieve documents
Let's build our first read query to give us all the documents in the students
collection in our database. Update the MainAsync
method with the following:
static async Task MainAsync()
{
var client = new MongoClient();
IMongoDatabase db = client.GetDatabase("school");
var collection = db.GetCollection("students");
using (IAsyncCursor cursor = await collection.FindAsync(new BsonDocument()))
{
while (await cursor.MoveNextAsync())
{
IEnumerable batch = cursor.Current;
foreach (BsonDocument document in batch)
{
Console.WriteLine(document);
Console.WriteLine();
}
}
}
}
The first overload to any of the find method takes 3 parameters; a FilterDefinition
which is used to define the filter for the query, an optional FindOptions
for specifying options for the query (e.g cursor timeout, batch size, etc), and an optional cancellationToken
.
In the code above, we specified an empty filter definition by passing an empty BsonDocument
to the method. Another way of writing this would be to use FilterDefinition<BsonDocument>.Empty
to indicate an empty filter. With an empty filter, we're basically telling it to give us all the documents in a collection. Afterward, we iterate over the cursor to get documents in batches (MoveNextAsync
in the while loop), and call cursor.Current
to get documents in the current batch, which then gets printed out.
Running the code above should give us all the documents we already have in that collection
{ "_id" : ObjectId("58469c732adc9f5370e50c9c"), "FirstName" : "Gregor", "LastName" : "Felix", "Class" : "JSS 3", "Age" : 23, "Subjects" : ["English", "Mathematics", "Physics", "Biology"] }
{ "_id" : ObjectId("58469c732adc9f5370e50c9d"), "FirstName" : "Machiko", "LastName" : "Elkberg", "Class" : "JSS 3", "Age" : 23, "Subjects" : ["English", "Mathematics", "Spanish"] }
{ "_id" : ObjectId("58469c732adc9f5370e50c9e"), "FirstName" : "Julie", "LastName" : "Sandal", "Class" : "JSS 1", "Age" : 25, "Subjects" : ["English", "Mathematics", "Physics", "Chemistry"] }
{ "_id" : ObjectId("58469c732adc9f5370e50c9f"), "FirstName" : "Peter", "LastName" : "Cyborg", "Class" : "JSS 1", "Age" : 39, "Subjects" : ["English", "Mathematics", "Physics", "Chemistry"] }
{ "_id" : ObjectId("58469c732adc9f5370e50ca0"), "FirstName" : "James", "LastName" : "Cyborg", "Class" : "JSS 1", "Age" : 39, "Subjects" : ["English", "Mathematics", "Physics", "Chemistry"] }
We can see a resemblance of the documents we added in the previous post but with a new property _id
. All collections have a unique primary index on this field, and if you don't supply one when creating documents, MongoDB supplies one by default. It's of type ObjectId
, and is defined in the Bson spec.
To demonstrate FindOptions
, I'll add an option that restricts the batch size to 2 which will then display what batch we're looping through in the console. Update your code with the following
FilterDefinition filter = FilterDefinition.Empty;
FindOptions options = new FindOptions
{
BatchSize = 2,
NoCursorTimeout = false
};
using (IAsyncCursor cursor = await collection.FindAsync(filter, options))
{
var batch = 0;
while (await cursor.MoveNextAsync())
{
IEnumerable documents = cursor.Current;
batch++;
Console.WriteLine($"Batch: {batch}");
foreach (BsonDocument document in documents)
{
Console.WriteLine(document);
Console.WriteLine();
}
}
Console.WriteLine($"Total Batch: { batch}");
}
And run it to get the following result:
Batch: 1
{ "_id" : ObjectId("58469c732adc9f5370e50c9c"), "FirstName" : "Gregor", "LastName" : "Felix", "Class" : "JSS 3", "Age" : 23, "Subjects" : ["English", "Mathematics", "Physics", "Biology"] }
{ "_id" : ObjectId("58469c732adc9f5370e50c9d"), "FirstName" : "Machiko", "LastName" : "Elkberg", "Class" : "JSS 3", "Age" : 23, "Subjects" : ["English", "Mathematics", "Spanish"] }
Batch: 2
{ "_id" : ObjectId("58469c732adc9f5370e50c9e"), "FirstName" : "Julie", "LastName" : "Sandal", "Class" : "JSS 1", "Age" : 25, "Subjects" : ["English", "Mathematics", "Physics", "Chemistry"] }
{ "_id" : ObjectId("58469c732adc9f5370e50c9f"), "FirstName" : "Peter", "LastName" : "Cyborg", "Class" : "JSS 1", "Age" : 39, "Subjects" : ["English", "Mathematics", "Physics", "Chemistry"] }
Batch: 3
{ "_id" : ObjectId("58469c732adc9f5370e50ca0"), "FirstName" : "James", "LastName" : "Cyborg", "Class" : "JSS 1", "Age" : 39, "Subjects" : ["English", "Mathematics", "Physics", "Chemistry"] }
Total Batch: 3
We can also write this in a shorter, cleaner way by calling ToListAsync
or ForEachAsync
to get all the documents from the cursor and put them in memory. There are extension methods on the IAsyncCursor
available to do this. Here are a few snippets:
collection.FindSync(filter).ToList();
await collection.FindSync(filter).ToListAsync();
await collection.FindSync(filter).ForEachAsync(doc => Console.WriteLine());
collection.FindSync(filter).FirstOrDefault();
collection.FindSync(filter).FirstOrDefault();
await collection.FindSync(filter).FirstOrDefaultAsync();
This looks neat and shorter, code-wise, but what it does is force all the documents to live in-memory. This might not be ideal in certain scenarios, and the cursor is helpful when the query result is huge and we can move the cursor by calling MoveNextAsync
or MoveNext
.
Find
This method is similar to its counterpart except that it returns an IFindFluent
interface. This is a fluent interface that gives us simple syntax to things like Count , Skip , Sort , and Limit (more on this ahead). From the IFindFluent
we can also return a cursor (by calling ToCursor
or ToCursorAsync
on it) or a list (by calling ToList
or ToListAsync
). With the code below, we can get all the documents and print them to the console using the Find
method
await collection.Find(FilterDefinition.Empty)
.ForEachAsync(doc => Console.WriteLine(doc));
Result
{ "_id" : ObjectId("58469c732adc9f5370e50c9c"), "FirstName" : "Gregor", "LastName" : "Felix", "Class" : "JSS 3", "Age" : 23, "Subjects" : ["English", "Mathematics", "Physics", "Biology"] }
{ "_id" : ObjectId("58469c732adc9f5370e50c9d"), "FirstName" : "Machiko", "LastName" : "Elkberg", "Class" : "JSS 3", "Age" : 23, "Subjects" : ["English", "Mathematics", "Spanish"] }
{ "_id" : ObjectId("58469c732adc9f5370e50c9e"), "FirstName" : "Julie", "LastName" : "Sandal", "Class" : "JSS 1", "Age" : 25, "Subjects" : ["English", "Mathematics", "Physics", "Chemistry"] }
{ "_id" : ObjectId("58469c732adc9f5370e50c9f"), "FirstName" : "Peter", "LastName" : "Cyborg", "Class" : "JSS 1", "Age" : 39, "Subjects" : ["English", "Mathematics", "Physics", "Chemistry"] }
{ "_id" : ObjectId("58469c732adc9f5370e50ca0"), "FirstName" : "James", "LastName" : "Cyborg", "Class" : "JSS 1", "Age" : 39, "Subjects" : ["English", "Mathematics", "Physics", "Chemistry"] }
Finding Specific documents
Most of the time we don't want to retrieve all documents, but rather specify a filter that returns documents matching that particular filter. Let's now look at ways of specifying filter to the query.
Using BsonDocument or string
We can define a BsonDocument as a filter and the query will find documents matching the fields defined in the document. Add the following code to your method and run it to retrieve student whose name is "Peter"
var filter = new BsonDocument("FirstName", "Peter");
await collection.Find(filter)
.ForEachAsync(document => Console.WriteLine(document));
And we get just one document
{ "_id" : ObjectId("58469c732adc9f5370e50c9f"), "FirstName" : "Peter", "LastName" : "Cyborg", "Class" : "JSS 1", "Age" : 39, "Subjects" : ["English", "Mathematics", "Physics", "Chemistry"] }
You could get a bit confused because these methods accept a FilterDefinition
but we're giving it a BsonDocument and it doesn't complain. This happens because it gets implicitly converted and we can also do this from a string. To use a string, we need to define a valid JSON string specifying the filter. We can do the same thing above with a string using the code below and we would still get the same result:
var filter = "{ FirstName: 'Peter'}";
await collection.Find(filter)
.ForEachAsync(document => Console.WriteLine(document));
We can also specify comparison or logical operator. An example would be getting students whose ages are 23. We can build the query as follows:
var filter = new BsonDocument("Age", new BsonDocument("$eq", 23));
or using a string
var filter = "{ Age: {'$eq': 23}}";
What we did was to add an identifier for an operator, which in our case was $eq
. Check this page for a list of operators and what they do. Let's run our code with one of the code above and see that it will give us students whose ages are 23.
{ "_id" : ObjectId("58469c732adc9f5370e50c9c"), "FirstName" : "Gregor", "LastName" : "Felix", "Class" : "JSS 3", "Age" : 23, "Subjects" : ["English", "Mathematics", "Physics", "Biology"] }
{ "_id" : ObjectId("58469c732adc9f5370e50c9d"), "FirstName" : "Machiko", "LastName" : "Elkberg", "Class" : "JSS 3", "Age" : 23, "Subjects" : ["English", "Mathematics", "Spanish"] }
Using the FilterDefinitionBuilder
You can use the FilterDefinitionBuilder
, which is a builder for FilterDefinition
. It provides a suite of methods to build up queries, and Lt
, being one of them, specifies a less than comparison. So we can define a filter using the FilterDefinitionBuilder
as follows:
var filter = new FilterDefinitionBuilder().Lt("age", 25);
Or use the overloaded method which accepts a LINQ expression:
var filter = new FilterDefinitionBuilder().Lt( student => student.Age, 25);
Also, you can use a static Builders
class to build a filter definition, and this class also has static helper methods for building other things, like projection definition, sort definition, and a few others.
var filter = Builders.Filter.Lt("age", 25);
var filter = Builders.Filter.Lt(student => student.Age, 25);
The driver also overloaded 3 operators for the filter definition. The and (&)
, or (|)
and not (!)
operators. For example, we want to get students whose ages are less than 25 and the first name is Peter, we can build up such query using the builder helper and the &
overloaded operator as follows
var builder = Builders.Filter;
var filter = builder.Lt("Age", 40) & builder.Eq("FirstName", "Peter");
thus running and getting a single document
{ "_id" : ObjectId("58469c732adc9f5370e50c9f"), "FirstName" : "Peter", "LastName" : "Cyborg", "Class" : "JSS 1", "Age" : 39, "Subjects" : ["English", "Mathematics", "Physics", "Chemistry"] }
LINQ Expression
The last part we haven't looked at is the overload of these methods that takes a LINQ expression and when we have a strongly typed object, we can build a filter query using LINQ expression. So let's say we want to get students whose ages are less than 25 and the first name is not Peter and print out their first and last names. We do that with the following code:
var collection = db.GetCollection("students");
await collection.Find(student => student.Age < 25 && student.FirstName != "Peter") .ForEachAsync(student => Console.WriteLine(student.FirstName + " " + student.LastName));
We change the collection type to Student
and running the following code, we get an error on the console:
![format exception error](cdn.filestackcontent.com/jPjZiygNSpS8CE4PqvHG "format exception error")
The error description says Element '_id' does not match any field or property of type Student and this happens because it couldn't map the _id
field from the database to any property of the Student type. This was automatically added when we created documented. Let's update the Student
type by adding a property of type ObjectId
defined in the MongoDB.Bson package.
class Student
{
public ObjectId Id { get; set; }
public string FirstName { get; set; }
public string LastName { get; set; }
public string Class { get; set; }
public int Age { get; set; }
public IEnumerable Subjects { get; set; }
}
Modify the code to also print out the ID and run it:
await collection.Find(student => student.Age < 25 && student.FirstName != "Peter") .ForEachAsync(student => Console.WriteLine($"Id: {student.Id}, FirstName: {student.FirstName}, LastName: {student.LastName}"));
Result:
Id: 58469c732adc9f5370e50c9c, FirstName: Gregor, LastName: Felix
Id: 58469c732adc9f5370e50c9d, FirstName: Machiko, LastName: Elkberg
And it just works. So most often, you would want to use expression tree syntax to build your queries. And on occasions you want more granularity, you can use the other ways of doing it.
In the next tutorial, we'll see how to do projections, sort, skip, limit and sort.