Digital Asset Management using digiKam and Exiftool: cleaning up messy metadata
This two-part article on digital asset management was prompted by the realization that the metadata in my collection of digital image files was in a sad state. Part 1 (this page) discusses how image metadata can get really messy and how to use Exiftool to clean up the mess. Part 2 covers my revised ingestion process.
Written around 2012. Minor update in March 2015.
How image metadata gets messed up
Here's how I managed to mess up the metadata in my photographs:
- Over the years I've used several different DAM softwares, each one of which left behind assorted bits of unwanted, misplaced, or out-of-date metadata.
- I have changed my mind several times regarding what metadata I really want in my image files.
- My copyright and contact information has changed since I started using DAM software.
- DAM metadata standards have evolved rapidly ("exploded" might be a better term) and some metadata that was valid even five years ago is now deprecated.
- I have used digiKam to manage my photographs since around 2010. Because of a digiKam bug that prevented image keywords and database keywords from keeping in synchronization with one another, my hierarchical keyword information was a mess. As an important aside, this bug was fixed probably since digiKam 3.5 (it was fixed in August 2013) and certainly by digiKam 4.2. But fixing the bug doesn't automatically fix whatever mess your metadata might have been left before the bug was fixed.
Your approach to digital asset management ("DAM") undoubtedly differs from my own. However, if your DAM software has made a hopeless mess of your image metadata, you might find useful some of the individual steps I took to get my own image metadata back into acceptable order. If you allow more than one DAM software to handle your image files, you can pretty much count on each one leaving its own cruft behind.
This page should be interpreted as describing an approach to locating and cleaning up unwanted metadata, whether or not you use digiKam. Please don't assume that digiKam still writes all the odd tags that it wrote in 2012, although in fact it might.
A word about Exiftool: Exiftool is a mature and reliable metadata reading and writing application. Although this article was written in 2012, Exiftool and the Exiftool syntax seldom change except to add new tags and new capabilities. This doesn't mean that all the exiftool command examples on this page still work exactly the same as they did when I first posted the article. But they probably do.
Exiftool and metadata: preliminary information
I used Exiftool to clean up my "metadata mess" because I've used Exiftool in the past, I trust that when used properly, Exiftool won't mess up my metadata or destroy my images; and I know the Exiftool syntax pretty well. Probably Exiv2 would work as well, but I'm not familiar with the Exiv2 command line syntax.
Metadata tags and digiKam tags
Unfortunately, Exiftool uses the word "tag" to refer to all the individual metadata items, whereas digiKam uses "tag" to refer only to lists of hierarchically arranged keywords. Neither program is right or wrong. They just use different terminology.
For the sake of clarity, in this article "fields" and "tags" both refer to "individual metadata items" (the vast majority of which are not hierarchically arranged keyword lists). I will try to always use the phrase "hierarchical keywords" when referring to digiKam "tags". Also, if you are familiar with Exiv2, Exiv2 and Exiftool do use slightly different names for the various metadata tags.
Exiftool tag names and tag groups
Exiftool can list image metadata using tag descriptions (e.g. "Owner Name") and also using tag names (e.g. OwnerName). When using Exiftool to write (rather than list) metadata, the tag name is used.
You probably already know that metadata can be grouped into different kinds or "groups" of metadata: exif, iptc, and xmp being the groups of most importance for the task at hand. Exiftool can list tag groups in several different ways, the most important of which for the task at hand are "information type" (the familiar EXIF, IPTC, XMP, and so on) and "Specific Location" (the probably less familiar IFD0, ExifIFD, XMP-tiff, and so on).
Tags with the same tag name but different groups and group "specific locations"
Some metadata information is stored in more than one metadata group, sometimes under the same name and sometimes under different names. Some metadata information is stored in more than "specific location" inside the same metadata group. For example, "orientation" information (which way the image should be rotated to be properly displayed on the screen) can be written in any or all of three different fields.
[IFD0] Orientation [IFD1] Orientation [XMP-tiff] Orientation
IFD0 and IFD1 (and also IFD2 and ExifIFD) are "specific locations" of the EXIF "information type" group. Using the tag's specific location plus the tag name avoids any ambiguity in which tag should be modified.
There is no law (except the law of common sense) that says the same information must be stored in all the tags that ostensibly do hold the same information. It's up to the DAM software to decide which tags to display, what information to write where, and how to reconcile any differences in information held in more than one place.
For example, in cases of conflicting orientation information, digiKam gives precedence to the IFD1 orientation tag, then to the IFD2 tag, then to the XMP-tiff tag. So if digiKam writes the XMP-tiff tag to a sidecar XMP file (and isn't set up to also write to the image metadata), the next time the image metadata is read the orientation flag will be reset to match what is in the IFD1/IFD0 metadata fields.
To avoid unpleasant surprises, the best course of action is to make sure that all metadata tags that ostensibly should hold the same information, really do hold the same information.
Listing image metadata with Exiftool
To list all image metadata by specific location and tag name, use the following Exiftool command line options:
exiftool -a -s -G1 filename.ext #"-a" list all the metadata #"-s" use the tag name rather than the tag description #"-G1" provide the tag group "specific location" #If you want the tag group "information type", use "-G0" instead of "-G1"
Before starting the actual metadata cleanup process, I made copies of a handful of representative images, listed each image's metadata using the above command, and copied the resulting terminal output to a spreadsheet.
It was something of an eye-opener to see all the bits and pieces of outdated and garbage metadata that had been written to my image files by the various DAM softwares that I've used in the past.
Words of caution
Before beginning such a drastic measure as removing huge chunks of your image metadata, please take the sensible precaution of first making a complete backup of your images and your digiKam database and put that backup on an external drive, disconnected from your computer. That way, in case you really mess up, you have a backup copy of all your images.
Exiftool is very powerful and, like all command line tools, if used incorrectly can wreck utter havoc upon your image files. Test, test, test on a folder of test images, and don't discount the possibility that I might have made a typo somewhere in an example command. In other words, use the information on this page at your own risk. When in doubt, consult the Exiftool documentation and ask questions in the Exiftool forums.
All the Exiftool example commands on this page assume you are using Linux. Windows commands are sometimes slightly different because (among other differences) Windows requires different escape characters. The possible differences are beyond the scope of this article (but well presented in the Exiftool documentation and frequently discussed in the Exiftool forums).
What metadata tags does digiKam write?
Below I list several different categories of metadata tags that digiKam can write (remember, by "tags" I mean metadata fields in general rather than specifically a list of hierarchical keywords). Then I show you what actually gets written to your image metadata when you ask digiKam to write various bits of metadata information.
Caption and "title" information
The digiKam right panel "Captions/Tags, Description tab" provides a place to enter the image caption and "title". If you allow digiKam to write to your image files, digiKam writes the image caption to all of the following metadata locations:
[ExifIFD] UserComment : This field of sunflowers covers several acres. [File] Comment : This field of sunflowers covers several acres. [IFD0] ImageDescription : This field of sunflowers covers several acres. [IPTC] Caption-Abstract : This field of sunflowers covers several acres. [XMP-dc] Description : This field of sunflowers covers several acres. [XMP-exif] UserComment : This field of sunflowers covers several acres. [XMP-tiff] ImageDescription : This field of sunflowers covers several acres.
If you allow digiKam to write to your image files, digikam writes the image "title" to the following metadata locations:
[IPTC] ObjectName : Sunflowers [XMP-dc] Title : Sunflowers
Why the scarequotes around the word "title"? Clearly the digiKam "title" is where digiKam users are supposed to enter a short synopsis of the image content. However, the proper metadata fields for a short synopsis of the image content are the ITPC Headline and the XMP-photoshop Headline fields.
So the reason I put the word "title" in scare quotes is because the XMP-dc Title and the IPTC Object Name (also referred to as the IPTC Title) are supposed to hold identifying information, for digital photographs, typically the image file name. For what it's worth, many DAM softwares other than digiKam also make this mistake, but it's still wrong and creates an incompatibility with DAM software that expect to find the actual Title (identifying, not describing) information in the Title field.
Star rating, color label, and pick label
If you allow digiKam to write to your image files, digiKam writes star rating, color label, and pick label information to the following metadata locations:
[IFD0] Rating : 2 [XMP-xmp] Rating : 2 [IFD0] RatingPercent : 25 [XMP-microsoft] RatingPercent : 25 [XMP-digiKam] PickLabel : 2 [XMP-digiKam] ColorLabel : 3
Hierarchical keywords and single keywords
If you allow digiKam to write to your image files, digikam writes hierarchical keywords and single keywords to the following metadata locations:
Hierarchical keywords: [XMP-digiKam] TagsList : Intake/Wrong-rotation, Attributes/pinhole, Intake/new image [XMP-lr] HierarchicalSubject : Intake|Wrong-rotation, Attributes|pinhole, Intake|new image [XMP-microsoft] LastKeywordXMP : Intake/Wrong-rotation, Attributes/pinhole, Intake/new image Single keywords: [IPTC] Keywords : Wrong-rotation, pinhole, new image [XMP-dc] Subject : Wrong-rotation, pinhole, new image
Mystery metadata you might not have asked for and/or didn't know you were getting
digiKam also writes metadata that you don't specifically ask for. I didn't ask digiKam to write any of the metadata items listed below. In particular, in the digiKam metadata settings I did not check the option to write the Caption Author Names or Date Time Stamps.
[IFD0] ProcessingSoftware : digiKam-2.9.0 [IFD0] Software : digiKam-2.9.0 [IPTC] OriginatingProgram : digiKam [IPTC] ProgramVersion : 2.9.0 [XMP-tiff] Software : digiKam-2.9.0 [XMP-xmp] CreatorTool : digiKam-2.9.0 [XMP-x] XMPToolkit : XMP Core 4.4.0-Exiv2 [XMP-digiKam] CaptionsAuthorNames : [XMP-digiKam] CaptionsDateTimeStamps : 2012-11-08T11:16:33
Some of these metadata tags are of questionable value and/or hold incorrect information. For example, according to the Exiv2 tag documentation:
- IFDO Processing Software: The name and version of the software used to post-process the picture. I use digiKam as DAM software, not for post-processing.
- IFD0 Software: This tag records the name and version of the software or firmware of the camera or image input device used to generate the image, which clearly isn't digiKam.
- XMP-tiff Software: Software or firmware used to generate image. Note: This property is stored in XMP as xmp:CreatorTool, again, clearly not digiKam.
For what it's worth, every DAM software I've ever used (not just digiKam) liberally sprinkles its name over every metadata field that can conceivably be construed as "software-related" (never mind that the software the metadata field refers to has nothing whatsoever to do with DAM software); some softwares go so far as to put the software name in various image comment/description fields.
Copyright, credit, and contact information
The digiKam right panel "Captions/Tags, Information tab" provides a place to enter Rights and Contact information. I use Exiftool rather than digiKam to write copyright, credit, and contact information, so I don't know which tags digiKam itself would have written. However, digiKam correctly reads and displays the information entered with Exiftool. Here is a list of the tags I write to my image metadata, showing sample information for each tag. I don't write location-related contact information (city, country, etc), so the following list of tags is not complete:
[IFD0] Artist : Elle Stone [IPTC] By-line : Elle Stone [XMP-dc] Creator : Elle Stone [IPTC] By-lineTitle : Photographer [XMP-photoshop] AuthorsPosition : Photographer [XMP-photoshop] Credit : Elle Stone, Nine Degrees Below Photography [IFD0] Copyright : Copyright © 2012 Elle Stone, all rights reserved. [IPTC] CopyrightNotice : Copyright © 2012 Elle Stone, all rights reserved. [XMP-dc] Rights : Copyright © 2012 Elle Stone, all rights reserved. [IPTC] Contact : email: whatever@example.com; website: http://example.com [XMP-iptcCore] CreatorWorkEmail : whatever@example.com [XMP-iptcCore] CreatorWorkURL : http://example.com
As you can see, there are quite a lot of copyright, credit, and contact information metadata tags that hold the same or overlapping information. As different softwares read and give priority to different metadata tags, and as "orphaned images" is an on-going issue for photographers, it's a good idea to make sure you have relevant copyright, credit, and contact information in all possible appropriate metadata locations.
digiKam templates
Via various metadata templates, digiKam can be used to enter a lot more metadata than what I've listed above. I don't use digiKam metadata templates, so I don't know what metadata fields they write to.
I did notice that digiKam location information is not always stored in the XMP metadata in the fields I would have expected. So just as with headline information which digiKam incorrectly writes to "title" metadata fields, how digiKam reconciles the location fields it writes with the location information it reads from an image file becomes a question. Procede with caution.
If you are trying to locate and remove other metadata fields than what I've already listed, Exiftool, of course, can list all the metadata information written in your image files, from which listing you can determine the proper tag name and group to be used when removing any unwanted metadata.
Using Exiftool to remove unwanted metadata
The basic process is you just set the contents of the unwanted metadata information equal to nothing, using the appropriate Exiftool metadata group and tag name.
Example Exiftool command lines for removing selected metadata
As I showed you above, digikam writes lists of hierarchical keywords to three different metadata locations. To remove all the hierarchical keyword tags for a single image, say, "testimage.jpg", the Exiftool command is:exiftool -XMP-digiKam:TagsList= -XMP-lr:HierarchicalSubject= -XMP-microsoft:LastKeywordXMP= testimage.jpg
digiKam writes lists of single keywords to two different metadata locations, and also write "title" information to two different metadata locations. If you want to remove all the lists of single keywords and also all the title tags from a single image, the Exiftool command is:
exiftool -IPTC:Keywords= -XMP-dc:Subject= -IPTC:ObjectName= -XMP-dc:Title= testimage.jpg
By default, before writing metadata to an image, Exiftool first creates a backup image with the same name as the original image but with "_original" added to the image extension. If you don't want Exiftool to create a backup image, add "-overwrite_original" to the command line options. For example, digiKam writes the image caption information to 7 different metadata locations. If you want to completely remove caption information from an image, and not create a backup of the image, the Exiftool command is:
exiftool -ExifIFD:UserComment= -File:Comment= -IFD0:ImageDescription= -IPTC:Caption-Abstract= -XMP-dc:Description= -XMP-exif:UserComment= -XMP-tiff:ImageDescription= -overwrite_original testimage.jpg
Exiftool tags for digiKam-written tags, ready for copy-pasting
Below is a list of all the digiKam-written metadata items discussed above, ready to be copy-pasted into a command line:
-ExifIFD:UserComment= -File:Comment= -IFD0:ImageDescription= -IPTC:Caption-Abstract= -XMP-dc:Description= -XMP-exif:UserComment= -XMP-tiff:ImageDescription= -IPTC:ObjectName -XMP-dc:Title -ITPC:Headline -XMP-photoshop:Headline -IFD0:Rating= -XMP-xmp:Rating= -IFD0:RatingPercent= -XMP-microsoft:RatingPercent= -XMP-digiKam:PickLabel= -XMP-digiKam:ColorLabel= -XMP-digiKam:TagsList= -XMP-lr:HierarchicalSubject= -XMP-microsoft:LastKeywordXMP= -IPTC:Keywords= -XMP-dc:Subject= -XMP-digiKam:CaptionsAuthorNames= -XMP-digiKam:CaptionsDateTimeStamps= -IFD0:ProcessingSoftware= -IFD0:Software= -IPTC:OriginatingProgram= -IPTC:ProgramVersion= -XMP-tiff:Software= -XMP-xmp:CreatorTool= -XMP-x:XMPToolkit= -IFD0:Artist= -IPTC:By-line= -XMP-dc:Creator= -IPTC:By-lineTitle= -XMP-photoshop:AuthorsPosition= -XMP-photoshop:Credit= -IFD0:Copyright= -IPTC:CopyrightNotice= -XMP-dc:Rights= -IPTC:Contact= -XMP-iptcCore:CreatorWorkEmail= -XMP-iptcCore:CreatorWorkURL=
Removing entire metadata groups all at once
Because DAM software I've used in the past had left so much garbage information in my image metadata, I actually went so far as to remove all IPTC and XMP metadata, and then put back in a very limited amount of metadata, mostly copyright/credit/contact information. Most people probably won't want to go quite that far. But if you do feel the need for such drastic measures, the exiftool command is:
exiftool -IPTC= -XMP= testimage.jpg
If you decide to remove all existing XMP information from your image files, be aware that some (newer) cameras write a limited amount of XMP metadata information. Removing camera-written metadata can have serious consequences, so proceed with caution (you do keep backup copies of your completely unmodified, "from the camera" image files in case you inadvertently corrupt, delete, or otherwise mangle an image, yes?).
Removing metadata from whole folders of images
Removing unwanted metadata from a single image is all very well and good. But what if you have thousands of images of different file types? Exiftool is capable of modifying whole folders and subfolders of images, all at once.
Let's say you want to remove all the rating information tags from a whole bunch of images arranged in a hierarchical folder structure. As shown above, digiKam writes rating information to 4 different metadata locations. The following command will recursively remove the unwanted tags from all images in the top and all subfolder, that have the extension ".jpg" or ".cr2":
exiftool -IFD0:Rating= -XMP-xmp:Rating= -IFD0:RatingPercent= -XMP-microsoft:RatingPercent= -r -ext jpg -ext cr2 /path/to/folder/with/test-images
A backup of each image file will be created unless you add the "-overwrite_original" option:
exiftool -IFD0:Rating= -XMP-xmp:Rating= -IFD0:RatingPercent= -XMP-microsoft:RatingPercent= -r -overwrite_original -ext jpg -ext cr2 /path/to/folder/with/test-images
Exiftool commands are mostly not case-sensitive. However, in some instances, such as when listing file extensions, the command is case-senstive. So the above command would need to be suitably modified to remove rating information from a file with the extension "CR2" or "JPG".
Starting over after cleaning up the metadata
Once your image metadata is all cleaned up and all the hierarchical keyword tags and single keyword tags have been removed, along with conflicting comment information and so forth, what next?
The good news is that all your digiKam-applied metadata is still in the digiKam database (unless you made the mistake of reading from your image files, in which case I don't know what will happen, but you did make a backup of your digiKam database, didn't you?).
So if you set digiKam up (temporarily, of course) to only write to the database (and not to the images or to XMP sidecar files — don't forget about the orientation flag on the second tab of the digiKam metadata settings dialog), then you can rearrange and delete tags on the digiKam tag tree without worrying about the database and images getting out of synchronization.
Where to keep image metadata
Once you finish rearranging the digiKam tag tree, what next?
There are three places image metadata can be kept: in a database, in the image itself, and in a sidecar file.
The advantage of writing to a database is that tagging and rating goes along much, much faster if the DAM software is not constantly reading from and writing to image files.
Anyone who's ever lost their metadata database or changed DAM software and not been able to transfer years of work to the new software, realizes the importance of also keeping the metadata in the image, or at least in a sidecar file, rather than only inside a database.
Anyone who's ever had DAM software erase wanted metadata, or write unwanted metadata, or corrupt individual tags, will understand my reluctance to have DAM software write directly to my image files.
If digiKam writes to the image files
One option is change the digiKam settings and write out all the nicely rearranged tag-tree information to the images. But until DigiKam Bug 268688 that keeps the image metadata and digiKam database from staying synchronized is fixed, things are going to get messy again as soon as you start deleting tags and/or rearranging the tag tree (Update: this bug was fixed in August 2013, so if your version of digiKam is newer than somewhere around digiKam 3.5, then you don't need to worry about this bug anymore).
If digiKam writes to XMP sidecars
Update: The information given below was current in 2012. One of the relevant digiKam bugs, Bug 309058 - Database can't be synchronized with XMP sidecars, might be fixed in digiKam 4.7 (I was asked to check and I haven't yet done so, but it's on my "to do" list). The other bug, Bug 227814 - HUB : new option to synchronize immediately files metadata or when application is closed , is still under review. So some of the information below is almost certainly outdated if your digiKam installation is recent enough.
If you ask digiKam to write to XMP sidecars instead of directly to the image files, only the metadata items in the above lists that start with "XMP-" are written to the sidecar files. So, for example, instead of the image caption being written to 7 different metadata locations, it is only written to 3 metadata locations.
When writing only to XMP sidecar files, the possibilities for confusion are obvious, as digiKam tries to reconcile the metadata in the image with the metadata in the sidecar file and the database. The only way I have found to avoid the confusion caused by conflicting metadata in the image, the XMP sidecar, and the digiKam database is as follows:
For every single bit of metadata that you ask digiKam to store in the digiKam database, remove that corresponding bit of metadata from the image file, from all places that digiKam would read from and write to the image file, if it were writing to the image file instead of the XMP sidecar file.
The only exception is if you are reasonably certain a particular bit of metadata (such as copyright or location information) won't ever change once it's been written to the image.
For example, I use digiKam to apply to my images the following (and only the following) metadata information: captions and "titles", hierarchical tags, color labels, and pick labels. So before I started writing only to XMP sidecar files, I first removed all of the following metadata from my image files:
[ExifIFD] UserComment [File] Comment [IFD0] ImageDescription [IPTC] Caption-Abstract [XMP-dc] Description [XMP-exif] UserComment [XMP-tiff] ImageDescription [IPTC] ObjectName [XMP-dc] Title (I don't know what digiKam does with "synopsis" image metadata that is written to the correct metadata field, but to be on the safe side, I removed that, too, after first creating a backup of the image file) [ITPC] Headline [XMP-photoshop] Headline [IFD0] Rating [XMP-xmp] Rating [IFD0] RatingPercent [XMP-microsoft] RatingPercent [XMP-digiKam] PickLabel [XMP-digiKam] ColorLabel [XMP-digiKam] TagsList [XMP-lr] HierarchicalSubject [XMP-microsoft] LastKeywordXMP [IPTC] Keywords [XMP-dc] Subject [XMP-digiKam] CaptionsAuthorNames [XMP-digiKam] CaptionsDateTimeStamps
At the same time I removed a lot of other metadata from my image files, metadata that had accumulated over time and that serves no useful purpose as far as I can tell, including:
[IFD0] ProcessingSoftware [IFD0] Software [IPTC] OriginatingProgram [IPTC] ProgramVersion [XMP-tiff] Software [XMP-xmp] CreatorTool [XMP-x] XMPToolkit
After the metadata is all cleaned up — synchronizing XMP sidecars and the digiKam database
I want the speed and convenience of writing only to the database, and the safety of writing to a sidecar file. So during any given tagging session I set digiKam to write only to the database. At the end of each tagging session I change the settings and write everything out to sidecar files. Setting and resetting the digiKam metadata settings (Bug 227814) gets a bit tedious, but not as tedious as waiting for digiKam to write out a tag to a whole bunch of files every time I make a change on the tag tree.
And that would be the end of my digiKam DAM workflow, except for the fact that digiKam has a bug that prevents it from keeping the digiKam database in synchronization with the digiKam XMP sidecar files (Bug 309058). I use Exiftool to work around this problem. Update: Bug 309058 - Database can't be synchronized with XMP sidecars might be fixed in digiKam 4.7
Exiftool can create a special MIE sidecar file to hold image metadata. digiKam doesn't recognize or read MIE sidecar files. So at the end of every tagging session, after writing out the digiKam XMP sidecar files, I use Exiftool to examine the digiKam XMP files. If the information in the digiKam sidecar files looks good (I'm not checking for digiKam failures so much as for my own failure to properly follow the steps in my workflow), I use Exiftool to transfer the desired metadata from the digiKam XMP sidecars to the Exiftool MIE sidecars. Then I delete the digiKam XMP sidecar files.
So when I start digiKam up for the next tagging session, there is no conflicting metadata in the XMP sidecars (because I deleted them) or in the image files (because I deleted from the images any metadata that I apply using digiKam). So the only image information that digiKam sees, that I might ask digiKam to modify, is in the digiKam database. In other words, digiKam can read the image metadata as many times as it wants, and will never see any information in the image that conflicts with or needs to be reconciled with what's in the digiKam database.
My current workflow (outlined in Part 2, Ingestion) is a bit more cumbersome than it will be when that happy day arrives (again, Bug 309058 - Database can't be synchronized with XMP sidecars might have been fixed in digiKam 4.7) when digiKam can keep digiKam sidecar files in synchronization with the digiKam database, thus eliminating the need to use MIE files as an intermediary. But most of the work is done at the command line and I've documented the steps, so all I have to do is copy-paste.