Recognizing and Identifying People in Family Pictures

I recently started to document my family’s history. This genealogy expedition became urgent since my grandparent might not life much longer. I encountered several practical and conceptual problems that are far from being trivial and I would like to invite you to work with me on them, possibly in the format of a (student) project.

This first challenge was to setup a collaboration with other family members that had already started collecting information. Since we do not life close to each other, we choose for the online service Ancestry.com. We made good progress and were able to quickly gather information about more than one hundred people. I can also download this information in the GEDCOM format to be able to process the data offline on my Mac using Reunion, the currently best genealogy program for the Mac. This software allows me to create nice family trees. Ancestry.com is also marketing their own software called Family Tree Maker, but it is only available for Windows. But then the real trouble started. I also have thousands of family pictures. They date back to the beginning of photography. Ideally, I want to document who the people in each photograph are. I encountered practical problems, but also conceptual ones and I would like to share my insight with you.

Lets first talk about some conceptual problems. Most genealogy programs allow you to add pictures to the people listed in their database. But in most of my pictures, several people are being depicted. We have here a classical many-to-many relationship. One person can be in multiple photographs and one photograph can contain multiple persons. This means that you cannot just create a folder for each person and drop their picture in it, because the same picture would have to be in multiple folders. You could work around this by creating shortcuts or symbolic links, but a file system is not made to easily maintain data structures. This is the typical task for databases. Most of the genealogy software use databases to maintain the information, but as of today, almost none have adequately solved this many-to-many relationship issue for photographs.

You could of course create your own database for your photographs, but it does not seem to make sense to maintain the information of the people in another software than the information about your photographs. On the other hand, if you do use an existing software or web service, you make yourself dependent on them. Any information you put into their system that goes beyond what the data format GEDCOM supports will be difficult to transfer to another software because each product uses its own format. Moreover, companies, software, operating systems, computers, all of these, come and go. What happens to your data when the company, who’s product you use, suddenly declares bankruptcy and stops updating the software? For some years you will be okay, but as soon as you start using a newer operating system, you might no longer be able to use the software.  What is most important is to keep your data in a format that can be transferred to the ever changing world of modern information technology. Using standardized formats, such as GEDCOM, greatly helps.

However, GEDCOM does not (yet) support storing information about photographs of the people in your family. Assuming that you do not want to go throught the trouble of building and maintaining your own database, where can you store this information in a standardized way? One solution is to store the information about who is in the photograph in the meta data of the photograph itself. The IPTC standard allows you to store text information about the picture within the file itself. Not every file format supports this standard, but two of the most commonly used formats, JPEG and TIFF, allow you to store meta data. So far so good. Many photo managing software allow you to add tags to each photo and stores them in the photo file. The names of your family members will of course become the tags, but you can also add information about when and where the photo was taken.

Lets summarize the main lessons so far before diving into more advanced topics. It is important to store your genealogical information, including information about the people in your family photographs in a standardized format, so that they will continue to be readable in the quickly changing world of modern IT. GEDCOM is the standard for genealogical data, but it does not help you storing information about who is who in all your family pictures. For this, you can use many photo managing software (such as iPhoto or Picassa) to add keywords to them, which are then being stored as IPTC compatible meta information.

Two problems remain. First, it would be desirable not only to know who is in a certain photo, but also where on the photo is that person. Imagine a large group photo. Just knowing all the people in there is only the first step. Knowing exactly who the person in the second row, third from the left, is the key. Marking the people in a photograph with a simple geometrical shape, such as a rectangle, would be the solution. But the IPTC does not allow you to store this information in a structured way. So again, we want to store information that go beyond the current standard for information storage. Again, you could extend a custom made database with the coordinates of a rectangular that defines the position of a person, but then we are back at the problem discussed earlier. Building and maintaining your own database is a considerable task that may go beyond your skills or your available time. So far I only found one genealogy program that allows you to mark the pictures in your collection: Family Tree Builder. Overall, it is a good program, but it is still a bit immature and it only runs on Windows.

I came across this software while searching for the solution for the second main problem. I have thousands of pictures, and most of them contain several people. That means that I would have to assign a great great great number of tags. Today’s face recognition algorithms are good enough to detect the face in a picture and they can even identify people on a picture. So, why not unleash their power on my collection of family pictures? Family Tree Builder can find faces on a photograph, but it does not seem (yet) able to actually identify them. They claim it could, but even after a thorough search, I could not find this functionality. Adobe Photoshop Elements is also able to find faces on a photograph, but it does not automatically identify and tag them. There are some web services, such as Riya, Fotonotes and Picporta that make considerable progress at identifying people on photographs, but they are not focused on family history.

Academia has started to address this problem (paper1, paper2), but it appears to me as if no commercial or open-source program has yet solved these open issues: Create a genealogy software that not only allows you to maintain factual information about people, but also about photographs. This software should automatically find and identify the people in the photographs and store this information in a standardized format. If you think you are up to this challenge, maybe in the form of a student project, then please contact me. I would be happy to host such a project.

14 thoughts on “Recognizing and Identifying People in Family Pictures”

  1. Update: Google released Picasa Web Albums 3 which contains face recognition. It works pretty well, but so far it is not possible to export the name tags. The information is therefore locked into the Google Universe. (http://picasaweb.google.com/home)

    Another software that hopefully will hopefully soon be released is iLovePhotos. (http://ilovephotos.com/) it is also supposed to feature face recognition.

  2. I am a beta user for picporta and also played with picasa, i must say picporta’s face recognition is quiet accurate and the way they have integrated it in picporta is quiet amazing.

    Also – now whenever i upload my pictures at picporta – they automatically suggest me tags based on the content – amazing isn’t it ?

    waiting for other players as well in this domain :)

  3. Boy this is a timely post, I’m struggling with this exact issue! I just had a bunch of photos scanned using ScanCafe and I want to catalog all of this info and tie it to a family tree. In addition to what you have mentioned I would really like to have group editing of this information of course (like a wiki) as I don’t know everyone in the photos but others do and a timeline tool (I came across a startup called Dipity that has a neat timeline feature (http://www.dipity.com) that pulls from Picasa and flickr, but they currently only go back 30 days or so, it’s based on RSS feeds.

    I too also came across the face recognition in Picasa and was encouraged, but disappointed in that only the owner can really make use of it. Flickr has the “community” stuff down pretty good, and you can do “notes” which allow you to draw boxes on the picture to identify things (like faces) but that just isn’t practical for large volumes of pictures.

    I sure hope something comes along on this front and hope you can let me know if you come across anything.

    Also, in regards to ancestry.com, how happy are you with their fees and what you get with it? To me, their subscription price seems high. I have gone in with the free piece and would like to do the subscription, maybe for a concentrated period of time just to get some of the value, then cancel the subscription because once I use their research tools I should be done. But I’m not sure if once you cancel, do you lose that data you collected and maybe some other features. In any case, just wondering what you think of it in general since you are using it. I’ll check out your other postings and see if you already answered this!

  4. Hello Christoph,

    Your post about recognizing and identifying people in family pictures is very interesting and I thank you for considering us in trying to solve this problem.

    We’ve released version 1.0 and I was wondering if you had a chance of downloading our software (available FREE at http://www.ilovephotos.com), because we would love to hear your feedback. Currently we are using face detection and we will move in the future to facial recognition.

    Good luck in improving the process of recognizing and identifying people in family pictures!

    Cheers,

    Damian
    damian@bluelavatech.com
    ilovephotos.com

  5. Hello,

    I experienced similar problem in the attempt to “reconstruct” my family history using narrations and a set of about 2000 pictures in a time range from 1880 until now.

    The main difficulty was to find genealogical software able to support intelligent linking and display of people in pictures.

    I use Family Tree Maker, though it not supports picture annotation (i.e. drawing boxes over each face in a picture and provide it with a link to the appropriate individual).

    I also reflected to put the local coordinates for each individual in e.g. IPTC. Some software must also handle the situation a picture is cropped afterward when the origin of the coordinates must be recalculated (e.g. memorize in IPTC how the cropping was provided).

    http://www.myheritage.com provides a good, imho, free genealogical software which includes picture annotation and linking.

    The feature I miss is the face identifying. I can accept the face recognizing could be done manually – the software should be assisted by the user in several matter.

    Talking about face identifying, a usual case is when I have a picture of known individual R1 at a later age and a set of pictures (some of them group pictures) taken at different dates (suppose 2-4 decades earlier). In this set some pictures represent groups in which the person R1 is supposed to be depicted, though not surely (e.g. could be not himself but a brother).

    What I miss is an algorithm and software based on anthropological measures which are relatively constant through the whole life. This software could be assisted by the user, i.e. it would not be a problem the user point part of the face (e.g. eyes, mouth, nose etc) and generally provides the necessary information letting the software to calculate an “individual profile” which could be use to recognize the individual during all life moments.

    What I mean is that it would be possible to identify some relative parameters (e.g. eye distance / eye to nose distance – lets call it P1) P1, P2 etc which are relatively specific for a physiognomy and relatively constant within an limited variation v1max, v2max etc during the whole life. P1, P2 etc should in this case be a kind of personal profile or spectrum.

    Identifying if individual Rx, about 65 years old is the same as individual R1 (but not same as R12) would mean that Rx had a set of (P1+v1, P2+v2 …) which is sensibly within P+vmax set for R1 but outside P+vmax for R12.

    Imho, the requirements for a tool genealogical face identification is not to be very fast and very automate. Instead, the software should perform functions on the scanned picture e.g. normalize (rotate, change perspective and distance), ask the user to point important face elements, calculate and store the specifically profile, match it with stored existent ones. The approach and final decision could be probabilistic e.g. “the person Rx is probably 90% the same as Rx but only 67% same as Ry). After that, the operator should decide, by applying other cross checking analysis about the person, whether the supposition is correct.

    About appropriate format to store the genealogical and visual information and links, probably an XML based gedcom, which permit further, non limited, extensions would be useful.
    http://www.familysearch.org/GEDCOM/GedXML60.pdf

    Sorry for my English – I hope I succeed to explain what I mean.

    It will be exciting to follow your progress. Good luck!
    George

  6. iPhoto 09 has now the ability to recognize people on photos, similar to Picassa Web. This is good news, but apparently it is not possible to export this meta information. Unless you create smart lists and add the names of the people manually into the comments. The comments can be exported using AppleScript (http://highearthorbit.com/289/).

    If you go via Aperture you can then export this information to Adobe’s XMP (http://www.adobe.com/devnet/xmp/) using Lightbox XMP http://www.lightboxsoftware.com/lightboxXMP.html

  7. Oh my gosh! Some one else has my problem!!! I came across this blog entry while trying to research possible software to find solutions for my quandary. As background, I do subscribe to ancestry.com and have recently found that the “unofficial family historian” has heaps and heaps and heaps of photos of various branches of our family. Many people are identified and some are not.

    Problem #1: Seems simple enough- why can ancestry.com use the same technology as facebook to identify people? facebook’s social networking site is made up tons of individuals (like those in my GEDCOM) and you can upload picture and movies. For pictures you can get the ubiquitous rectangle and tag a person. Movies, you simply tag that person as being in the movie. That would solve a bit of the many to many issue. This also solves another huge obstacle of ancestry, which is sure I can link the photo to various photo, but unless I can mouse over the face to get the tag, it is almost useless. Sure I know Sally, John, Emma, Richard are in the picture but which one is which if I am not familiar enough to identify them?

    Problem #2: Storing data about the photo itself. Off the top of my head I would like to inventory: name of the photographer and location (if available), estimated year, possible significance (like you identified an event) which might tie to the GEDCOM. I am actually OK not tying to the actual event if I could overcome Problem #1 and #2.

    These are the conceptual problems I have that are similar to what you identified. Second I have a few practical questions, but I think it is also a function of me being a newbie to the details of photography and digital photography.

    Questions #4: what is the correct dpi to scan at (because this will feed into my next Question/Problem)? I want to scan all of our old photos at an appropriate dpi and format to allow for high quality reproductions. I want the originals properly stored and backups made in case anything should happen. However the size soon become unwieldy…

    Question #5: For uploading to ancestry my file must be smaller than 10MB. clearly this size can support a high quality reproduction of the photo. However, the time it will take me to scan all these photos and THEN upload giant files is so intimidating I don’t even know where to start. Do you maintain one copy of the picture for reproductions and another for easy uploading to sharing sites?

    Question #6: This then brings me to why I want to share… duh. We have tons of very old family pictures which means some families might have NONE of their ancestors. I WANT them to have them. I want them to see them. I am so passionate about genealogy, I want others to have the opportunity. Keeping the originals is important too because to give the family the pictures to the various families disperses the collection to the four winds and means that you risk them being lost if some generations do not care about them. Plus, opening up what you have allows you to collaborate with others to identify previously UNKNOWN people. (That is why Problem #1 is so key- it isn’t just about tracking who you KNOW but who you don’t!)

    Question #7: So OK, question #6 addresses a manual and emotional reason to identify previously unknown people in pictures but it very imperfect. So therefore I would like facial recognition like what I saw advertised in the MAC vs PC commercials. Scanning through thousands of photos it should be able to identify those we missed (while scanning, and scanning, and scanning)

    Question #8: How many places do I need to maintain this information? Ack. So, say in my lifetime I scan all these high quality photos, upload them to ancestry (although picture also contain people and info that we may NOT maintain in our GEDCOM like friends, neighbors, the family dog, a local, a heirloom). I have to maintain my pricey ancestry membership lose my entire effort (though the though of uploading these pics to some other site makes me want to cry). Second I still want to maintain my original scans (which I back up using carbonite). BUT….

    IF I NOW FIND THE SOLUTION TO MY PHOTO INVENTORY AND FACIAL RECOGNITION NEEDS DO I NOW NEED TO SPEND MY LIFE IMPORTING THE FILES INTO A NEW SOFTWARE/WEBSITE!!!!!!??????!!!!!

    *sigh* I would hope such a software would either allow you to actually scan INTO the software (like Adobe does), OR the software should be based on linking to the original file location on a hard drive noting when new files are added or deleted… thus saving the ordeal of having yet another instance of the pictures to maintain. Preferably PC based so that I don’t have to upload to a website.

    Part of the answer should be that ANCESTRY.COM SHOULD HAVE THIS CAPABILITY!!!!!!!! For what we pay for it, it should perform facial recognition of all photos we upload to other members trees if they could just solve problem#1. It would be nice to receive discounts based on your contributions as an incentive to preserve family heirloom photos.

    However, the other part is that I am a firm believer in open-source free ware too. Ancestry already has us hostage for other reasons… it is the BEST out there (despite my complaining). But the people developing these programs are not usually family historians or genealogists. Any solution should integrate with at least a GEDCOM file.

    Sorry so long, but I am so disheartened that there is no solution out there yet. I would be willing to help, assist, test for anyone out there developing such a solution.

    In the meantime, I will give the Family Tree Maker solution a try.

  8. You exposed some very interesting issues with pictures. I have another potential problem: picture corruption. It’s fairly easy to get your pictures or files corrupt, especially when doing housekeeping by moving files around disks. It’s virtually impossible to check the 1000s of pictures if they are not corrupt after some time. You can use some CRC SFV tool but it is limited to the whole file. If you edit pictures, adding metadata, they have a different CRC, but graphical data is intact. What I need is a CRC check of only graphical data of JPG stored as metadata and a program to create and validate such CRCs. That way no matter how you add keywords, GPS coordinates, etc, you can always check the integrity of the picture.
    (JPG intergrity check)

  9. I have just received newsletter from myheritage.com. They claim to have included face detection and recognition in their pictures.

  10. ImageMagick has a command to create a CRC of the image data (thereby ignoring EXIF stuff that might change)

    identify -format “%#” myimage.jpg

    Personally, I have been storing this hash in the comment section of my photos, as such:
    orig-hash:#################################
    so that even if the file gets manipulated, there is a chance I will be able to tell what original it came from.

  11. I have the same problem and it is going to get worse. Out of my 4000+ hi-resolution scans there are many that are of marginal value to me but would be very valuable to others. Sharing on Ancestry would work, but it is cumbersome to work with, does not accept video or audio files (you can record on the site but not produce off line) and there is no face recognition so far. You can make small cropped faces from your group photos and use those, but this is a lot of work. Picasa is great but not exportable so far, though I have seen comments in the forums that indicate that a future version may support export of facial recognition info. myHeritage sounds good but looks a little rough as yet, and if you collaborate on line and make changes these cannot be downloaded. The changes exist only in the cloud. A good website based program called TNG solves many of the problems but does not yet have facial recognition.

    I hope someone takes your comments to heart and develops a program to your excellent specifications. But realize it will not be trivial. The facial recognition engine must be licensed from the author. The GEDCOM format is fairly robust and standard, but extremely clunky and old-fashioned. Most real coders insist on using XML or a SQL database instead, but then it must be translated to GEDCOM for export. Losses and errors in translation are a knotty problem.

    To Carolyn E. Re: Q4, there is no “right” dpi. Given that photos fade, get damaged or lost with time we need a photo archival copy that retains all the original information. This archival copy really should be a TIF with no compression. Someday a lossless compression scheme will become so popular that it becomes a cross-platform standard that will survive the evolution of operating systems and platforms. It hasn’t happened yet. High resolution TIFs are huge files. So the only real archival solution so far is hard drives that are maintained by upgrade every few years to a new hard drive, with multiple copies in multiple locations. Since most of my valuable photos are black and white, I use 16 bit greyscale TIFs. I crank up the resolution til it hurts IE: till the zoomed-in results look like film grain and not pixels. A typical 5×7 print usually comes out to 48 Mbytes this way. Resolutions from 800 to 1600 dpi are not uncommon. Slides need 6400 dpi. If your scanner can do multi-pass scanning this helps a lot. Of course you can’t upload these files and you don’t need to. Use Picasa to make lower resolution copies and organize them at the same time, with tags and facial recognition. Picasa can read the TIFs (but don’t edit them in Picasa! you will lose resolution) and you can group them as needed for export. Then you can batch export the whole album in lower resolution with one command, or upload for sharing etc. Hope this helps.

  12. bp Tech: Thanks for your feedback!

    This blog entry has stuck in my head for the last 6 months or so as I trudge through the extensive colletions family members have and figure out how to sort it all out. Some things are heirloom pictures and some things are heirlooms themselves!

    The first thing I found is that you really need to define a goal, because the solutions depend on it. Is it 1) to preserve the photos and the identities of the people in the pictures? (AND attempt to identify the ones you DON’T know) 2) Do you have a larger interest in genealogy so you are interested in the RELATIONSHIPS among the people in the pictures and the among the pictures? 3) Do you have big hairy audacious goals of 1, 2 as well as 3) documenting the pictures themselves as heirlooms and every detail of the photo (location where was it taken, when it was taken, what photography studio, why was it taken) as well as every detail you can about the people in the pictures (a depth and bredth that moves you from genealogist to family historian).

    I find myself not bitten but smitten with the genealogy bug! I fall in category 3, and unfortunately I have yet to find a satisfactory solution. I started on ancestry.com and it works well in that I can assign the photo to a person in my tree AND I can use rectangle “mouse-over notes” to specifically identify who is in the picture. Through the social networking that ancestry.com supports I have been able to identify some previously unknown people through other people who have connected to my tree. Even if there was an automated face recognition software out there, it will give Type I and Type II errors. You have to validate its solutions anyway and the people doing the research have the details and facts to help do that. The problem lies in some photos I have uploaded that are people not yet identified. Ancestry.com also limited the files to 10MB, not exactly the size needed to accomplish goal #1 like bpTech has shown. But overall, I have found it extremely rewarding because people who are serious enough to shell out the fairly substantial ancestry.com fee are serious enough to search for the answers.

    Genealogy software solutions do a pretty good job of #1 and #2. Local genealogy software can link to pictures on your harddrive or expanded harddrive OR to directly scan into the software. They provide fairly nice linear relationships. I want the genealogy software solutions to go farther and track facts about neighbors/friends, heirlooms, photos as heirlooms, and homes (like architect, when it was built, what it cost, etc)

    I am struggling because I want more complex relationships than the genealogy software is providing. Viewing the picture not just as a representation of the individuals in linear relationships, but the complex relationships of the photos as objects separate from the individuals and other objects in them.

    Needless to say, I think I will need to cobble together several solutions to meet all goals 100% :)

  13. Here’s an interesting genealogy program that allows tagging [here it is called a rectangle tool] of individuals in group pictures and linking that tag to that person’s own page.

    It doesn’t show up very high in any google searches. It was only after persistent googling that I found it.
    Note that javascript must be enabled to view the tags.

    FamiliaBuilder
    http://www.dralex.com/

    Some features:
    Reads genealogical data from GEDCOM files. That means that you can use FamiliaBuilder with almost any genealogy program.
    Lets you create an album with still pictures, video/audio clips and other types of files.
    You can link files to individuals and vice versa.

    FamiliaBuilder supports group pictures: our rectangle tool lets you show where is each individual in the picture.

    See the samples from FamiliaOnline to see the thumbnails enabled.
    http://online.dralex.com/home/showcase.html

  14. Alas it is now 10 years from the last post (which I am just seeing now) and from my perspective the same problems exist. I use RootsMagic v7 to store genealogy info and ACDSee to store photos. I want to be able to search for individuals or groups of people, or specific events. That can be done in ACDSee but it is not linked to RootsMagic. Conversely I can import all my images to RootsMagic but cannot search them in the same way. If I move them to a new folder, all the connections are lost. Has anyone solved the many to many issue yet, preferably in a way that links to genealogical info?

    Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *