Processing Ajax...

Title
Close Dialog

Message

Confirm
Close Dialog

Confirm
Close Dialog

Text search on docx and pptx

User Image
Paul R1
14 discussion posts
Hi,

I'm trying to do some text searches over a lot of common format files. I've managed to sort out most of my issues which were due to either finger trouble (not using the settings properly) or ifilter issues (pdf doesn't work with adobe's version 11 ifilter but it does with version 9).

However i'm stumped with trying to get a simple text search to work on either docx or pptx files. The files do get opened because I'm not getting any errors I.e. if I remove the offfiltx.dll filter I can force an error to be reported because it cannot open the file but when it's there no error and no match is reported.

I've tried various versions of office ifilter with the same result:

MS Filter Pack 1 (2006.1200.6212.1000)
MS Filter Pack 2 (2010.1400.4746.1000)
The ifilter that installs with Office Professional Plus 2010 (2010.1400.7015.1000)
the ifilter update KB2810071 (2010.1400.7104.5000)

In each case I verified that fileseek was reporting the correct version

What am I missing? This must be a really basic problem as searching docx files has to be one of the most common requirements and nobody else seems to have a problem.

FYI I have Office Professional Plus 2010 installed. I've tried it on both a Server 2008 and a Windows 7 PC with the same result. To test this I've just created a small word document and saved it in multiple formats and/or copied the same text into other applications and everything, including xlsx (which uses the same offfiltx.dll file) works except the docx and pptx.

Any suggestions would be most appretiated.
Oct 21, 2014  • #1
Keith Lammers (BFS)'s profile on WallpaperFusion.com
Could you attach a screenshot of the Search tab after running the search?
Oct 21, 2014  • #2
User Image
Paul R1
14 discussion posts
Actually I created a different set of even simpler test files. All of them consist of the phrase "the quick brown fox jumps over the lazy dog" and I then searched for "quick"

When I run asearch on these files everything works except pptx and xlsx i.e. the docx worked but the xlxs doesn't however in this case I do get errors on both files (see attached).

My original set of larger documents still do what they do i.e. docx doesn't work and neither does pptx plus I get no errors (see attachment 2).

The only difference here was a different set of files so a different match word and different folder as well. All other settings remain the same. I created 11 file types for the new simple test compared to 10 for the original set of documents which is why there are different numbers of results i.e. screenshot 1 shows all 11 results, 2 of which are errors whilst screenshot 2 shows 8 results with two missing (no error and no match)
• Attachment [protected]: Screenshot 2.jpg [338,946 bytes]
• Attachment [protected]: Screenshot.jpg [371,174 bytes]
Oct 21, 2014  • #3
Keith Lammers (BFS)'s profile on WallpaperFusion.com
Ok, that's strange! If you install the SP2 for the Office 2010 Filter Pack (https://support.microsoft.com/kb/2687447), does that help at all?
Oct 21, 2014  • #4
User Image
Paul R1
14 discussion posts
It gets weirder!!

I just copied my original set of files into the same folder as the simpler set and ran the queries there. docx works in the new location and I still get errors for the pptx and xlsx files.

Now I moved all of the files back to the original folder. docx does not work even for the simpler search on the word quick

I think I need to think about this a bit more because it seems to be location dependent.

does fileseek require the index to have been done?
Oct 21, 2014  • #5
User Image
Paul R1
14 discussion posts
Keith,

I've narrowed it down a bit more and it's still weird however I'd be grateful if you could also give it a try.

Attached is a zip file. It contains two separate folders each of which contain the same set of test files (quick brown fox). In addition, the filtering1 folder has an extra blank spread sheet (.xlsx).

When I run a search on each or these folders for the word "quick" I get different results as follows:

In the filtering folder (no extra xlsx file) I get 9 results and no errors but there were 11 files. The two missing are the docx and pptx files. this implies no match in those two files

In the filtering1 folder I get 11 results, two of which are errors. The errors are the pptx and slxs files. There were 12 files here but only 11 should match as the extra file was empty so I got a result for every matching.

The error apparently arises because I have "show an error for files with no file handler" ticked i.e. unticking this hides the error but this implies that in the first folder all files had a file handler but in the second folder the pptx and xlxs files didn't. note that the blank file (also an xlsx file) did not show up as an error

I was trying to find out why I got different errors in one folder rather than another and it turns out that in the one where the docx does not work, I had an extra xlsx file and this seems to be the trigger.

I'm going to try some different combinations but it would be really good to know if you get the same result at your end
• Attachment [protected]: Filtering.zip [402,355 bytes]
Oct 21, 2014  • #6
User Image
Paul R1
14 discussion posts
I should add that you need to unzip the two folders i.e. I searched each of them individually and directly when unzipped
Oct 21, 2014  • #7
User Image
Paul R1
14 discussion posts
Hi Keith,

I've narrowed it down a bit more and it seems to be related to the numbers of docx, xlsx & pptx files in a single directory, the common link being that all of these file types use the same ifilter. Different versions of office ifilter seem to make no difference to the outcome.

Where a directory contains one or more of a single type of these files then all is OK but as soon as you add one of the other filetypes to the same directory then strange things happen.

In the main what you see is that one of the file types will match ok and an error (no handler found) is produced for the other file types but in a few scenarios, only one file type is matched and no other output is reported i.e. fileseek is reporting no match (and no error) when in fact it should have matched the other file type(s).

I ran some tests on a single folder with combinations of docx, xlsx & pptx file types. Each file contained the same quick brown fox text and the search was simply for the word "quick" which should match to all files. The results are in the attached spread sheet. Each column represents a test with the numbers being the number of each file type in the folder. Note that all I was doing was moving different numbers of copies of the test files in and out of the test folder so I was not making any changes to the fileseek settings in any way.

So is there something strange with my particular set-up or is there something amiss in fileseek? You should be able to quickly verify the same thing at your end to see if you get the same results.
• Attachment [protected]: Results.xlsx [10,966 bytes]
Oct 22, 2014  • #8
User Image
Paul R1
14 discussion posts
Any comment?

Currently my only workaround is to run searches for docx only, pptx only, xlxs only and then to search for * in the file match, excluding docx, pptx & xlxs i.e. I have to run each search 4 times. Given that I'm searching through terabytes of data this is less than ideal
Oct 23, 2014  • #9
Keith Lammers (BFS)'s profile on WallpaperFusion.com
Sorry for the delay, I'm going to test this out here tomorrow to see if we can reproduce the same results and get it fixed up :)

Thanks!
Oct 23, 2014  • #10
Keith Lammers (BFS)'s profile on WallpaperFusion.com
Quick update, I was able to reproduce this issue here as well. It's really strange, but we'll get it fixed up as soon as possible :)

Thanks!
Oct 24, 2014  • #11
User Image
Paul R1
14 discussion posts
Ok thanks - glad to know I wasn't just doing something increadibly dumb!
Oct 24, 2014  • #12
Keith Lammers (BFS)'s profile on WallpaperFusion.com
We've just released a new FileSeek version (http://www.fileseek.ca/Download/) and this issue should be all fixed up. Please let us know if you run into any trouble after updating.

Thanks!
Jan 16, 2015  • #13
Was this helpful?  Login to Vote(1)  Login to Vote(-)