I recently started to document my family’s history. This genealogy expedition became urgent since my grandparent might not life much longer. I encountered several practical and conceptual problems that are far from being trivial and I would like to invite you to work with me on them, possibly in the format of a (student) project.
This first challenge was to setup a collaboration with other family members that had already started collecting information. Since we do not life close to each other, we choose for the online service Ancestry.com. We made good progress and were able to quickly gather information about more than one hundred people. I can also download this information in the GEDCOM format to be able to process the data offline on my Mac using Reunion, the currently best genealogy program for the Mac. This software allows me to create nice family trees. Ancestry.com is also marketing their own software called Family Tree Maker, but it is only available for Windows. But then the real trouble started. I also have thousands of family pictures. They date back to the beginning of photography. Ideally, I want to document who the people in each photograph are. I encountered practical problems, but also conceptual ones and I would like to share my insight with you.
Lets first talk about some conceptual problems. Most genealogy programs allow you to add pictures to the people listed in their database. But in most of my pictures, several people are being depicted. We have here a classical many-to-many relationship. One person can be in multiple photographs and one photograph can contain multiple persons. This means that you cannot just create a folder for each person and drop their picture in it, because the same picture would have to be in multiple folders. You could work around this by creating shortcuts or symbolic links, but a file system is not made to easily maintain data structures. This is the typical task for databases. Most of the genealogy software use databases to maintain the information, but as of today, almost none have adequately solved this many-to-many relationship issue for photographs.
You could of course create your own database for your photographs, but it does not seem to make sense to maintain the information of the people in another software than the information about your photographs. On the other hand, if you do use an existing software or web service, you make yourself dependent on them. Any information you put into their system that goes beyond what the data format GEDCOM supports will be difficult to transfer to another software because each product uses its own format. Moreover, companies, software, operating systems, computers, all of these, come and go. What happens to your data when the company, who’s product you use, suddenly declares bankruptcy and stops updating the software? For some years you will be okay, but as soon as you start using a newer operating system, you might no longer be able to use the software. What is most important is to keep your data in a format that can be transferred to the ever changing world of modern information technology. Using standardized formats, such as GEDCOM, greatly helps.
However, GEDCOM does not (yet) support storing information about photographs of the people in your family. Assuming that you do not want to go throught the trouble of building and maintaining your own database, where can you store this information in a standardized way? One solution is to store the information about who is in the photograph in the meta data of the photograph itself. The IPTC standard allows you to store text information about the picture within the file itself. Not every file format supports this standard, but two of the most commonly used formats, JPEG and TIFF, allow you to store meta data. So far so good. Many photo managing software allow you to add tags to each photo and stores them in the photo file. The names of your family members will of course become the tags, but you can also add information about when and where the photo was taken.
Lets summarize the main lessons so far before diving into more advanced topics. It is important to store your genealogical information, including information about the people in your family photographs in a standardized format, so that they will continue to be readable in the quickly changing world of modern IT. GEDCOM is the standard for genealogical data, but it does not help you storing information about who is who in all your family pictures. For this, you can use many photo managing software (such as iPhoto or Picassa) to add keywords to them, which are then being stored as IPTC compatible meta information.
Two problems remain. First, it would be desirable not only to know who is in a certain photo, but also where on the photo is that person. Imagine a large group photo. Just knowing all the people in there is only the first step. Knowing exactly who the person in the second row, third from the left, is the key. Marking the people in a photograph with a simple geometrical shape, such as a rectangle, would be the solution. But the IPTC does not allow you to store this information in a structured way. So again, we want to store information that go beyond the current standard for information storage. Again, you could extend a custom made database with the coordinates of a rectangular that defines the position of a person, but then we are back at the problem discussed earlier. Building and maintaining your own database is a considerable task that may go beyond your skills or your available time. So far I only found one genealogy program that allows you to mark the pictures in your collection: Family Tree Builder. Overall, it is a good program, but it is still a bit immature and it only runs on Windows.
I came across this software while searching for the solution for the second main problem. I have thousands of pictures, and most of them contain several people. That means that I would have to assign a great great great number of tags. Today’s face recognition algorithms are good enough to detect the face in a picture and they can even identify people on a picture. So, why not unleash their power on my collection of family pictures? Family Tree Builder can find faces on a photograph, but it does not seem (yet) able to actually identify them. They claim it could, but even after a thorough search, I could not find this functionality. Adobe Photoshop Elements is also able to find faces on a photograph, but it does not automatically identify and tag them. There are some web services, such as Riya, Fotonotes and Picporta that make considerable progress at identifying people on photographs, but they are not focused on family history.
Academia has started to address this problem (paper1, paper2), but it appears to me as if no commercial or open-source program has yet solved these open issues: Create a genealogy software that not only allows you to maintain factual information about people, but also about photographs. This software should automatically find and identify the people in the photographs and store this information in a standardized format. If you think you are up to this challenge, maybe in the form of a student project, then please contact me. I would be happy to host such a project.