Pages

January 28, 2013

Bulk file download from SharePoint using DocExtractor


This is something that I developed a while ago and thought to share it online.

A real scenario
Imagine that you are a site collection admin in a small company, and your boss asked you to store on DVDs all the documents that belong to an employee who already left his position..several months ago.
Now to add little spices to the requirements, the manager might say something like:
"..oh, and we need all the Excel reports that had been uploaded by Mr. X to the Operations site between September 2012 and January 2013".
You get he picture. You can imagine other possible scenarios like when there is a need to store the contents of each department before terminating a big project.


Using Search?
SharePoint has powerful search functionality (especially when equipped with FAST search). You can specify some criteria, but search does not behave as we want in this context. There are few criteria options to play with, and You can only download one file item at a time. You will not be able to download all files that match the criteria as a bulk download operation. Also, you can't get list attachments.


My bulk download solution: DocExtractor
I designed and developed an easy to use, criteria–based file downloader for SharePoint.


A User originally must have Site Collection administration rights. I added the ability to navigate  through local web applications and select desired site collection from login form so user must have also Farm administration rights (both roles can be played by the same user in small companies). 


The user can either select to download all files on the site collection without exception, or to specify desired criteria in no time. Criteria can be any combination of the following:
  • Site(s), 
  • User(s) and his/her role (Author/ Modifier) , 
  • File type (groups of formats in MS Office, PDF, Images, Videos, Audio, Web and custom), 
  • File size (in B/KB/MB)
  • File creation/modification date period
  • Text in title (a 'Contains' Filter)

All lists and libraries (including hidden ones) will be searched, and the tool will download: last checked-in version of each file, as well as list attachments. Although the tool has access to all files, the download operation itself is subject to blocked files constraint.
We can choose whether to download the files with or without its folders hierarchy. If the latter option used then the tool will rename file copies automatically on the output directory. 

While downloading, user can cancel the operation at any time. Also, the user will see colored notifications about the download status. After download operation is completed, a log file is created to show all the download details and errors if any.



By the way, the download is really fast. I tested that on my dev. farm machine, and it downloaded more than 2800 files (size is about 2.3 GB) in less than 2 minutes.



The development part

The utility is a Windows Forms application consisting of GUI forms and work classes that uses SharPoint Object Model. Queries are dynamically created by making CAML conditions out of user criteria, then inserting these conditions into one big query. This big query will be used to retrieve URLs of matching files from site or site-collection using SPSiteDataQuery objects. List attachments are also examined to get the matching URLs. These combined URLs to be downloaded in a background thread by iterating through the URLs, get their corresponding SPFile objects, then copy their byte streams on the target folder.

For example, when user specifies file type(s), then conditions are added to the query as this:
qBuilder.addCondition(@"
                            
                            doc
                        ");
qBuilder.addCondition(@"
                            
                            docx
                        ", "OR");

And this is a method to retrieve the list of file url from site collections including the qualified list attachments:
// site specific attachments
// Specific files according to user criteria in all sites
public List _getSpecificDocsInSiteCollection()
{
    List FileRefs = new List();
    SPSiteDataQuery DocFilesQuery;

    using (SPSite site = new SPSite(SiteUrl))
    {
        //using (SPWeb web = site.OpenWeb("/"))
        using (SPWeb web = site.OpenWeb())
        {
            user = null;
            if (UserLoginName != "")
                user = web.EnsureUser(UserLoginName);// only when specific user

            DocFilesQuery = new SPSiteDataQuery();

            DocFilesQuery.Lists = "

The project is on Github.com, you can download the executable and the project from here.
The utility was tested on two SP 2010 Foundation farms. Of course a lot of changes and features can be added later such as implementing more criteria or the ability to zip the resulted output folder automatically.
Happy SP coding and administration :)

No comments:

Post a Comment