Pages

December 28, 2016

Search cannot see some metadata tagged documents SharePoint

Greetings to fellow SharePoint developers and admins.. here is one more issue about crawling managed metadata terms and how to solve it.


Background:
The objective was to have a custom document search by terms. In SharePoint 2013, we have a document library with couple of managed metadata fields (some are single while others are multi value). Based on some logic, a custom code solution will assign (tag) term values to documents. Tagged documents are ultimately to be found using a custom search page by passing 2 term names and their Guids. Example:

/Pages/customsearch.aspx?Sites=5915df2d-f1e7-4247-90ff-9b55a391afc1&SiteCategory=3edf1303-7fd3-497d-a23e-bceb01015f8c

The search page contains Content Search Web Parts. Each one contains a custom query that uses query strings to retrieve all items tagged with multiple values of either or both of the 2 terms used. Example:

{|owstaxIdSites:{QueryString.Sites}*} {|owstaxIdSiteCategory:{QueryString.SiteCategory}*}


The Issue:
The CSWP query show some documents, but do NOT show other documents even if they were tagged with exactly the same terms.

Here is the list of symptoms:
- Document item is tagged with valid terms (we can see/edit the terms with no issue in SharePoint UI in DispForm/EditForm views respectively on all fields)
- Document is crawled successfully and can be viewed by default search page when search by title
- No issue from crawl log regarding said documents themselves nor their metadata terms
- If I just just edit the document and Save, then run crawl it might fix the issue for some documents!
- Also, if I call SystemUpdate() on item from code (without making any other changes), then crawl it will be fixed as well!
- I noticed that for affected documents, all terms from all metadata fields are not showing under under any query.


Failed attempts: 
Tried the following methods,but Non of the below worked:
- Tried re-run Incremental Crawl.
- Tried partial re-index target site or content source (to force search to consider documents as 'dirty' as if the property 'vti-searchversion' was updated) then run Incremental-Crawl.
- Tried Full crawl.

Hmm, so it's not about the search crawl, not the document, not the terms, but something with how the terms are stored in metadata columns. But from the UI, everything seems normal..


Root Cause:

So I paid another visit to the custom code part. After careful checking, I found a bug in code where there are cases where it inserted multiple copies of the same term in one of the fields.

Example: from browser the metadata column show one term 'POLICIES', but from code it has multiple duplicates:

POLICIES|93aef6f0-5558-4c55-9fb2-fbe622a59e8c;
POLICIES|93aef6f0-5558-4c55-9fb2-fbe622a59e8c;
POLICIES|93aef6f0-5558-4c55-9fb2-fbe622a59e8c

Apparently, when multiple copies of the same term are pushed by code to a managed metadata column, then it's it is a column validation issue. The SharePoint UI will not complain but the search crawler somehow will have a problem with it and will drop the managed metadata crawling for the all managed columns of that item altogether. The document item itself is with other non-managed columns will be crawled.


On the other hand, I can conclude the default SharePoint metadata controls offer some validation/fix. When editing the document from UI, and just click Save without making any changes, the duplicate terms might get removed.


Solution:
Check you code to see where it tries to update managed metadata fields with duplicate copies of the same term. No exception will be thrown so you have to carefully check your logic.

FYI, This is a related post where merging terms in store lead to the same issue


Question