Crawling as fast as possible

More and more we are becoming a generation of searchers. And that is not meant in a philosophical, running your fingers through your beard kind of way, but in a very practical everyday reality kind of way. When was the last time an argument you had among friends was not settled by searching the internet for the right answer? And this trend is not just influencing your personal life, but your work life as well. A clear shift is happening within corporations to go to a flatter organization structure, have self-organizing teams and increased cross functional interaction. We are trading in hierarchies for communities. Heck, there are even companies that let their employees pick their own job titles[1].

In this world of less structure one thing becomes more and more important to still be able to do your work: search! However, making sure stuff is available to be found within SharePoint Online is not always straightforward.

Our colleagues Marijke Ursem and Martin Offringa wrote a blog (read it here) about the workings of search in SharePoint and how to make sure the search results are shown just how you like it. So we will not cover any of that here. Instead we will dive into the bag of tricks we have to make sure that content is searchable as quickly as possible.

The Index

For those of you who are new to the subject of search in SharePoint, let us quickly cover some of the basics.

The search results you see in the content search web part or the search results web part are not coming directly from your lists and libraries, but from the search index. The index can be considered as one big bucket with all the searchable content and only stuff that is in the index can found through search.

Based on an automated schedule the index is filled with the latest changes that occurred in your tenant. This is done by the crawl, and in SharePoint Online there are two variants of the crawl: 1) the continuous crawl that runs every 15 minutes and picks up new and changed documents or items and 2) the incremental crawl that runs every 4 hours and picks up changes in the search configuration.

Crawling 1

Schematic to show how the content a tenant, the crawl and the index relate to each other.

Lack of Control

One of the most heard complaints about search in SharePoint Online related to search is that even with the highest of permission levels on your tenant, you are still not fully in control of the crawl. This is because, in contrast with an On Premise situation, the automated schedule cannot be changed. In SharePoint Online, it is Microsoft who runs the show.

But there is no use in complaining, because at the moment there is no option to speed up the crawls. So if you can’t beat them, join them. Because there are some tricks that help you to go as fast as possible when it comes to having your changes crawled in SharePoint Online.

The Basics

First, a document which has no published version will not be crawled. So when you are working inside a document library that has minor (concepts) and major (publications) versioning activated, make sure to publish your documents.

Second, when you add a column to a list of library it will not be crawled if there is no item that has a value for that column. So make sure that at least one item contains a value for this new column, even if that means adding a temporary test item.

The Simple Tricks

Maybe the best analogy for the scheduled crawls is to view them as an old fashioned postman, who is doing his rounds on a fixed schedule. And on his round he comes by a series of classic postboxes with the little red flags on them. The classic postbox works by raising the flag when there is something in it and leaving it down when it is empty. And let’s decide that in this analogy raising the flag is a signal to the postman doing his rounds to empty the postbox.

Furthermore, it is important to know that the crawl acts on value changed. So in our analogy, value changes raise the flag automatically and indicate to our postman to pick up the changes. So if you have a document and you change for example the person mentioned in the “owner” field then this change will automatically be picked up by our postman. However, when you change the way the owner needs to be presented from “account name” to “display name with presence” this change will not automatically raise the flag since no value change occurred. Only a setting change was done.

To make sure your change is picked up anyhow, you can raise the flag yourself via the library settings, which is described in a support article of Microsoft[2].

Crawling 2

The same article also describes how to raise the flag for a whole site and since Microsoft already did an excellent job of explaining how it is done, we have nothing to add to their story.

Crawling 3

When we leave the libraries and list behind us and start getting our hands dirty within the search center of SharePoint Online there are also some tricks we can pull out of our top hat. Within the Search Schema we can have a ball setting up managed properties and mapping all sorts of crawled properties to our managed properties.

For those of you who are new to the Search Schema, crawled properties and managed properties and want to learn more about the topic, we recommend to give the support article Manage the search schema in SharePoint Online a good read.

While you can do a lot of nice and necessary work inside the Search Schema, you will have to do something extra to make sure your changes have effect. The reasoning behind this is that “…Because your changes are made in the search schema, and not to the actual site, the crawler will not automatically re-index the site”[3]. What you need to do is re-index the site which uses the crawled property that you have used in your managed property mapping and then “…site content will be re-crawled and re-indexed so that you can start using the managed properties in queries, query rules and display templates”. Or if the crawled property is only attached to a certain library or list, you can re-index that list which will have the effect that “…all of the content in that library or list is marked as changed, and the content is picked up during the next scheduled crawl and re-indexed”.

So for sites, lists and libraries we have the power to raise the flag and our postman (a.k.a. crawl) will pick up our changes and update them in the index so they are seen in search results.

 

The Advanced Tricks

At this point your question will undoubtedly be what else you can do to give the crawl a kick, because going into every site that you want to raise the flag for one by one is just too much of a hassle.

Well unfortunately, this raising the flag thing is the only instrument we have in the wen interface. Because as said, there is no way to influence the schedule, only ways to influence what is picked up during the next round of our postman. But rest assured, we are not suggesting that you actually go into every site that you have and click a button. We are suggestion that you put others to work for you.

The first option you have is to put Microsoft to work for you. Via the Admin Center you can raise a ticket to Microsoft technical support and ask them to re-index a bunch of sites, a site collection or even all your site collections. It is also possible to request a re-index of all the user profiles. What Microsoft Technical Support then will do is raise the flag for all your content so that everything gets picked up during the next round of the postman. Upside is that Microsoft can do this much more efficient, but the downside is that you still have to wait for the next incremental crawl. And of course, there is waiting involved between raising the ticket and getting a response from Microsoft.

So, where do we have to turn to get even faster results? This is really not a question of who, but a question of what. Because the answer lies in PowerShell. For those of you who want to learn more about Windows PowerShell, this TechNet article is a nice place to start.

With PowerShell we can fire off commands to our SharePoint tenant and, just to name an example, can raise the flag on a bunch of sites. So this puts you back into control and releaves you from waiting on Tech Support to pick up your ticket. Plus, you won’t have to do much scripting, because others have already done it for you. Two scripts that are particularly handy come from Mikael Svenson (https://twitter.com/mikaelsvenson).

The first script enables an admin to trigger a re-index of a site collection and all its sub sites[4]. The way the script raises the flag is by changing the search version property of the site or site collection which ensures that the site will be picked for re-indexing on the next incremental crawl. This is a major time saver in the sense that you do not have to manually trigger re-indexing on every single site

The second script allows you to raise the flag for all the user profiles in your tenant[5]. A user profile is just another content record and for it to be picked up by the crawl it needs a value change. So when you start changing user profile properties it would require a user to change something about their profile before the change is picked up. And since users do not necessarily change their profile’s very often, it might take a while before your change has reached all users. So this script is a major help in activating your change for all profiles in your tenant. Also because there is no way to raise the flag manually on a profile other than to apply a value change to that profile. And actually, this is also what the script of Mikael does. On every profile it overwrites a property value with the same value, which in the eyes of SharePoint is a value change and thus all the profiles are picked up by the next incremental crawl.

 

Summary

When working with search in SharePoint Online you have to deal with the fact that you cannot influence the crawl schedule. Just put it out of your mind and try to accept it. What you can do is make sure that all the changes that your made are picked up as soon as possible by the continuous and incremental crawls that pass by your tenant. Or, to put it in terms of our analogy, making sure that the postman is picking up your message on his very next round.

 

Disclaimer

A lot of the items discussed in this blog have been created, communicated or distributed by others as first. We certainly want to put credit where credit is due, so we tried to do our absolute best to always show the source or inventor of the trick where this was possible.

Also, the scripts mentioned in this blog should only be used and deployed by people who understand what they are doing. Never let code loose on your tenant that you do not understand yourself. This warning has nothing to do with PowerShell or these scripts in particular, but is just part of good sensible ownership for any admin.

If needed Rapid Circle can help you understand and safely deploy these scripts on your tenant and help you save time in configuring search for your SharePoint Online environment.

[1] http://www.fastcodesign.com/3034987/evidence/the-case-for-letting-employees-choose-their-own-job-titles

[2] https://support.office.com/en-us/article/Manually-request-crawling-and-re-indexing-of-a-site-a-library-or-a-list-9afa977d-39de-4321-b4ca-8c7c7e6d264e?ui=en-US&rs=en-US&ad=US

[3] https://support.office.com/en-us/article/Manually-request-crawling-and-re-indexing-of-a-site-a-library-or-a-list-9afa977d-39de-4321-b4ca-8c7c7e6d264e?ui=en-US&rs=en-US&ad=US

[4] http://www.techmikael.com/2014/02/how-to-trigger-full-re-index-in.html

[5] http://www.techmikael.com/2014/12/how-to-trigger-re-indexing-of-user.html