Matt Cameron
11 discussion posts
FileSeek is a wonderful program enough so that i convinced the powers that be to run a site license here. I am using it primarilly to locate credit card information within our network.
Using the iFilters and Filter Packs is in theory the way i need to go to scrub out erroneous data that I am not looking for such as meta data for office filles and pdfs. I have installed the Adobe and Office Filters from the links provided however it does not look like there are working.
I Read for pdf filters on a 32bit os that you might need to install Adobe Reader which i have done.
http://www.documentlocator.com/support/get-ifilters.htm
Either way between the Adobe Filter listed on the FAQ and adobe reader i would have expected that the files would be parsed properly. If i look up File Handlers in FileSeek i see that the .pdf's handler is query.dll which should instead be AcroRdIF.dll. ( even though the Microsoft Filter Pack is installed the .xlsx is also pointing to Query.dll)
Its not the program's fault but do you know a was that i can manually change the handler. I think this information is derived from HKLM\Software\Classes\{extension}\PersistentHandler but im not sure.
Also how often does the handler list attempt to refresh itself. I have noticed that some extensions seem to dissappear and reappear.
My list of handlers is attached for your viewing pleasure.
• Attachment [protected]: export.txt [37,872 bytes]
Jun 5, 2013 (modified Jun 5, 2013)
•
#1
Awesome, glad to hear you like it!
There's actually a bug in the current version that messes up the File Handler list in the Settings, but that will be fixed in the next beta.
However, the file handlers should still be working if you have them installed. Could you check the Advanced tab for your search to make sure the "Process file contents using File Handlers" option is enabled?
Ok, strange! We've just posted FileSeek 3.1. Could you give it a try?
Matt Cameron
11 discussion posts
Forgive the hiatus as i was away from work for a while. I am still having issues getting FileSeek to read pdf's properly and ignore metadata like font configuration and the like
I have done this so far
Updated FileSeek to 3.1.1
Uninstalled Adobe Reader X and installed XI. I had read that X had issues with programs using the filters so i updated.
Did a search with Reader present and while uninstalled and i got the same result?
Now that the handler bug is gone i can confirm that the .pdf is point to the corrent dll inside the Adobe Reader XI installation.
In the end while doing searches i get results like below which i am trying to get rid of. I can
/Widths [ 250 250 371 250 500 840 778 208 333 333 389 250 250 333 250 606 500
The above hit came from running this against the pdf file i have attached as an example. Just a income tax guide ... nothing secret. This should have been skipped over. THe regex query i ran was:
\b(4(?:\d[ -]*?){12}(?:(?:\d[ -]*?){3})?|5[1-5](?:\d[ -]*?){14}|3[47](?:\d[ -]*?){13})\b
Not sure where the fault lies. Hoping to see if you can recreate the issue.
• Attachment [protected]: t2 guide.pdf [595,100 bytes]
No worries, thanks for the update! I'll test this out next week and keep you posted on what I find out.
Thanks!