Pages

January 28, 2013

Bulk file download from SharePoint using DocExtractor


This is something that I developed a while ago and thought to share it online.

A real scenario
Imagine that you are a site collection admin in a small company, and your boss asked you to store on DVDs all the documents that belong to an employee who already left his position..several months ago.
Now to add little spices to the requirements, the manager might say something like:
"..oh, and we need all the Excel reports that had been uploaded by Mr. X to the Operations site between September 2012 and January 2013".
You get he picture. You can imagine other possible scenarios like when there is a need to store the contents of each department before terminating a big project.


Using Search?
SharePoint has powerful search functionality (especially when equipped with FAST search). You can specify some criteria, but search does not behave as we want in this context. There are few criteria options to play with, and You can only download one file item at a time. You will not be able to download all files that match the criteria as a bulk download operation. Also, you can't get list attachments.


My bulk download solution: DocExtractor
I designed and developed an easy to use, criteria–based file downloader for SharePoint.


A User originally must have Site Collection administration rights. I added the ability to navigate  through local web applications and select desired site collection from login form so user must have also Farm administration rights (both roles can be played by the same user in small companies). 


The user can either select to download all files on the site collection without exception, or to specify desired criteria in no time. Criteria can be any combination of the following:
  • Site(s), 
  • User(s) and his/her role (Author/ Modifier) , 
  • File type (groups of formats in MS Office, PDF, Images, Videos, Audio, Web and custom), 
  • File size (in B/KB/MB)
  • File creation/modification date period
  • Text in title (a 'Contains' Filter)

All lists and libraries (including hidden ones) will be searched, and the tool will download: last checked-in version of each file, as well as list attachments. Although the tool has access to all files, the download operation itself is subject to blocked files constraint.
We can choose whether to download the files with or without its folders hierarchy. If the latter option used then the tool will rename file copies automatically on the output directory. 

While downloading, user can cancel the operation at any time. Also, the user will see colored notifications about the download status. After download operation is completed, a log file is created to show all the download details and errors if any.



By the way, the download is really fast. I tested that on my dev. farm machine, and it downloaded more than 2800 files (size is about 2.3 GB) in less than 2 minutes.



The development part

The utility is a Windows Forms application consisting of GUI forms and work classes that uses SharPoint Object Model. Queries are dynamically created by making CAML conditions out of user criteria, then inserting these conditions into one big query. This big query will be used to retrieve URLs of matching files from site or site-collection using SPSiteDataQuery objects. List attachments are also examined to get the matching URLs. These combined URLs to be downloaded in a background thread by iterating through the URLs, get their corresponding SPFile objects, then copy their byte streams on the target folder.

For example, when user specifies file type(s), then conditions are added to the query as this:
qBuilder.addCondition(@"
                            
                            doc
                        ");
qBuilder.addCondition(@"
                            
                            docx
                        ", "OR");

And this is a method to retrieve the list of file url from site collections including the qualified list attachments:
// site specific attachments
// Specific files according to user criteria in all sites
public List _getSpecificDocsInSiteCollection()
{
    List FileRefs = new List();
    SPSiteDataQuery DocFilesQuery;

    using (SPSite site = new SPSite(SiteUrl))
    {
        //using (SPWeb web = site.OpenWeb("/"))
        using (SPWeb web = site.OpenWeb())
        {
            user = null;
            if (UserLoginName != "")
                user = web.EnsureUser(UserLoginName);// only when specific user

            DocFilesQuery = new SPSiteDataQuery();

            DocFilesQuery.Lists = "

The project is on Github.com, you can download the executable and the project from here.
The utility was tested on two SP 2010 Foundation farms. Of course a lot of changes and features can be added later such as implementing more criteria or the ability to zip the resulted output folder automatically.
Happy SP coding and administration :)

January 23, 2013

Another MCTS certificate in my arsenal


Yesterday I had the exam TS: SharePoint 2010, Application Development (C#). The exam was tough but I passed with a high mark; I had to wait until today to download my Microsoft's MCTS digital certificate from my MCP site. 
Now I am eager to see when Microsoft is going to announce its certification path for SharePoint 2013 development, as the dev exams are currently available only for SharePoint 2010.  Click here to see current exams and certifications available. 

January 17, 2013

Happy SharePoint-Iraq 4th Anniversary


Happy 4th anniversary to our SharePoint-Iraq team. 

This is a story of our study team and SharePoint. Yes, it was a joint effort between me and my friend Eng. Ahmed Alzubaidy to establish our team for IT studying. We named our team: SharePoint-Iraq because we wanted to start learning Microsoft SharePoint..
We built our small IT lab on January 2009, and started studying SharePoint 2007. We spent 6 months to 'tame' it. We realized that in order to really understand SharePoint, we needed to study the entire Microsoft stack that SharePoint relies on. We studied Windows Server, AD, DNS, SQL Server, IIS, authentication and many other protocols and ideas. That was a huge task by all means, and we are still doing it. Our study process has evolved too and we used white board with stickers and we started recording training videos on the stuff we learn. As time goes we learned/worked on SharePoint 2010 Foundation/Server and Online with the SharePoint Designer and InfoPath Designer. We came a long way, and this is how SharePoint became literally our 'child'.
On a parallel path, I started a programming path for SharePoint so that both Administration and development go side by side. BTW, we named our SP knowledge base as: "SharePoint Empire".. sweet ha!

In 2009

In 2012

Digging on Could not find this Item error in SharePoint


I worked on this error a while ago when I was trying to replicate what was happening to a client's environment. When you want to upload EXE and DLL files to a document library in SharePoint using Explorer View, the operation fails with the following annoying error msg:

"Could not find this item
This is no longer located in \\sitename\DavWWWRoot. Verify the item's location and try again."

Of course uploading these types of files into SharePoint is highly not recommended due to security reasons, this is why they are on the blocked file types in Central Admin but for some companies it is a requirement.
You can find the complete error description and solution here, thanks to Stanislaw Delost!

But I wanted to see for myself what is the 'item' that is not being found as indicated by the error msg.
So, I built a small Windows program to imitate the exact operation of Explorer View's operation of copying the wanted files from a source folder into the desired doc library's WebDav folder.


And the error stack trace:

System.IO.FileNotFoundException was unhandled

  Message=Could not find file '\\dev\DavWWWroot\test\AERotatorWebpart.dll'.
  Source=mscorlib
  FileName=\\dev\DavWWWroot\test\AERotatorWebpart.dll
  StackTrace:
       at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
       at System.IO.File.InternalCopy(String sourceFileName, String destFileName, Boolean overwrite)
       at System.IO.File.Copy(String sourceFileName, String destFileName)


The code is simple. I used the program to copy normal document files and images and it worked just fine. When I used it to copy EXE and DLL files the program crashed. As you can see the exception msg in .NET is "Could not find file "\\ServerName\DavWWWroor\DocLibName\FileName.DLL"
and the exception is of type FileNotFoundException.
It looks like that it couldn't find the file that we want to copy in the first place..weird!!

Explorer View operation uses WebDav protocol and agents that get installed with SharePoint.
When WebDav detects a file of type .EXE or .DLL, it runs 2 handlers in IIS: CGI-exe (for EXE files), and ISAPI-dll (for DLL files). My guess is that these programs are trying to access our files from the doc lib folder (destination) before the copying operation completed, so it will not find the file and an error occurs. 

As mentioned in details in Delost's blog, the solution is to prevent the CGI-exe and ISAPI-dll handlers from running in IIS by removing the EXECUTE permission in Hanlder Mappings in IIS.