<< DiscussionsReply

Search string character encoding

Avatar from Gravatar.com
8 discussion posts
Just found FileSeek (FS) and it looks like a promising solution for my need.
I would like to use FS to scan image files (mostly JPGs) for text strings in metadata.
Some of these strings are plain ASCII, other are Unicode, UTF-16, UTF-8, DBCS or possibly other character set encodings. (Not all metadata editor follow the specs)
I have read through (most of) the other discussions on this topic and I understand that I can try to specify an encoding in the advanced settings, but it still involves guessing at the exact spelling and availability, even if the encoding is spelled correctly.
Is there a way FS can supply a list of which encodings are supported by making the "Custom file Encoding" a drop-down with built-in encodings.
If not, what happens if I misspell an encoding or ask for an unsupported encoding as soon as I close the 'Settings' rather than wait until I tr yo use it?
When I try a bad encoding, the error message I get points me to the 'documentation; and provides a link, but when I follow this link I end up at the top of a web page, but I find myself lost because I now am expected to do a new search for the topic. And what is worse, the error message text cannot be highlighted and copied to the clipboard for easier searching osition:relative;vertical-align:middle;">
TIA for any help.
22 days ago  • #1
Keith Lammers (BFS)'s profile on WallpaperFusion.com
I'm not sure that FileSeek can search the metadata. How do you normally view the metadata for one of these images? Are you using another utility?
21 days ago  • #2
Avatar from Gravatar.com
8 discussion posts
I use a number of utilities to view the data.
Unfortunately, I have found that not all programs, free & otherwise, which claim to be usable to view/edit/display JPG metadata stick to the specifications closely enough. Some are very selective about what they show or handle. (My main interest in all of this is genealogical metadata.)
Hence most of my work looking for data has been with hex editors, but the also are usually limited in their search option and typically allow searching only within a single file.
That is where utilities such as FileSearch look promising. Searching for JPG metadata normally involves only a (relatively small) number of special character encoding, if one sticks to the original spec. But I have found many metadata editors, which seem oblivious to the specs. This comes partly from a fault with the developers, but from also misunderstood specs/expectations and usage by users.
Being able to select the character encoding with reasonable ease would make the job a lot easier. At least the more common formats such as UTF-8, UTF-16, M$'s wide character sets, aside from plain ASCII would be a big help.
I quite understand that supporting arbitrary encodings might be a challenge and I don't really expect that.
Still, from various forums, I have found that others are struggling with similar issues, and a lot of their problems revolve around the numerous encodings used to handle the many European languages.
21 days ago  • #3
Keith Lammers (BFS)'s profile on WallpaperFusion.com
Ok, thanks for clarifying! I will check in with our developers to see if there's a list somewhere of what encoding values can be used in that advanced setting.
19 days ago  • #4
Keith Lammers (BFS)'s profile on WallpaperFusion.com
Ok, you can actually set multiple encoding values in that setting, separated by a comma. The full list is here, make sure to use the "Name" field in the advanced setting: https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding?view=netframework-4.6.2#list-of-encodings

When you specify multiple encodings, FileSeek will search the file once for each encoding, so if you specify 3 encodings, it will search the file 3 separate times.

Hope that helps!
14 days ago  • #5
Avatar from Gravatar.com
8 discussion posts
Hi Keith,
thank you for the information and my apologies for being slow to reply.
The encoding names look useful, but I still, I find FS rather confusing; when will it search for the query string in the file name and when will it look for text inside the files - assuming I specify a file pattern in the 'Include files' box.
Somehow, I have lost the option to search inside files, which seemed to work the first time I tried it

Also, FWIW, if FS finds a JPG file which contains the query string, it will show the corresponding image in the box at the lower right, but it will not clear that field the next time I start a new query
12 days ago  • #6
Keith Lammers (BFS)'s profile on WallpaperFusion.com
The file contents will always be searched with the text in the Query box. On the Advanced tab, there are options to enable/disable also searching the file/folder name with the text from the Query box.

Thanks for the heads up on the image preview not clearing when a new search is started. I've added that to our list to fix up osition:relative;vertical-align:middle;">

6 days ago  • #7
Avatar from Gravatar.com
8 discussion posts
Good to know that the contents are always searched.
Still, FS does not seem to find things the the old AgentRansack can find for me. While I know nothing of the search algorithm for either app, nor what the built-in character set option is set at for AR, for my needs AR seems to be more likely to give me the results expected osition:relative;vertical-align:middle;">isplay:inline-block;text-align:centerosition:relative;vertical-align:middle;">
6 days ago  • #8
Keith Lammers (BFS)'s profile on WallpaperFusion.com
FileSeek just does a straight plaintext search on the file contents unless there's a file handler for the type (e.g. .docx or .pdf). What kind of info are you finding with AgentRansack that you're not able to locate with FileSeek?
5 days ago  • #9
Avatar from Gravatar.com
8 discussion posts
Well, I have no idea how it searches, but I am attaching some screenshots and a test file for you.
As you can see, FS does not find anything in the one test file.
If I have missed an option, which would fix this, please let me know.
FWIW, these strings are part of the JPG 'guts' and are in ACII, for the most part, though some other strings of interest are in either UTF8 or multi-byte MBCS, even possibly UTF-16 though I have not been able to fully investigate that part
• Attachment [protected]: FS_AR_Ducky-Screenshot - 2020-10-16 , 2_37_59 PM.png [51,263 bytes]
• Attachment [protected]: FS_FS_Ducky-1-Screenshot - 2020-10-16 , 2_37_59 PM.png [40,879 bytes]
• Attachment [protected]: FS_FS_Ducky-2-Screenshot - 2020-10-16 , 2_37_59 PM.png [36,493 bytes]
• Attachment [protected]: smiley1.jpg [2,909 bytes]
5 days ago  • #10
Keith Lammers (BFS)'s profile on WallpaperFusion.com
I'm able to find that string in the smiley1.jpg that you attached if I disable the "Process File Contents Using File Handlers" option on the Advanced tab. Does that work for you?
1 day ago  • #11
Avatar from Gravatar.com
8 discussion posts
Afraid not.
My 'real' test case has a number of similar files in a directory tree, including that smiley file in several branches, all containing that string, with one of the sub-directories also named Ducky
FS only finds the sub-directory on my system; if I rename the sub-directory, it does not even find that.
As a test, I also cleared out the field in Custom encoding, but it made no difference. It had been "UTF-8;us-ascii", I believe
17 hours ago  • #12
Was this helpful?    
<< DiscussionsReply