Recently me and my father digitized two large books. My father did the bulk of the work by photographing more than 1200 pages. He first photographed all the odd pages and then all the even pages. As with any repetitive task, errors occurred and he missed a few pages.
All the even pages were in one folder and all the odd pages in another. The goal was of course to merge them into a single PDF document. If it wasn’t for the occasional missing pages this could have been straight forward. Just use Apple’s Automator to rename all the files. Automator allows you to give a bunch of files a base name followed by a serial number.
The trick is to serialize the odd pages 1-1200 (e.g. drewes0102) and then the even pages in exactly the same way. This is possible since the even and odd pages are still in separate folders. Next you can use Automator again to add a suffix to the file names.
Give the “a” suffix to the odd pages and the “b” suffix to the even pages. You can then move all the files into one directory. They will be sorted as:
The last step is to either use Adobe Acrobat or Automator to merge the individual files into a single PDF document. For the automator option you first need to create a PDF document only for the even pages and one for the odd. The “Shuffling pages” options allows you in a third step to combine these two PDF documents into one.
Since there were certain pages missing this solution was not sufficient. If for example page 3 was missing then the sequence would be:
It would also be great if the book’s page number would correspond to the PDF document page number. Meaning that if you got to page 103 in the PDF file, you would like to see page 103 from the book. The solution was to include white dummy pages for the missing pages.
The following pages then all need to be re-serialized. Meaning that you first have to move all the good page into a dedicated directory, call it “good images”. Add the white dummy pages with the the right serial number manually. You then rename all the remaining files in the original directory. I decided not to use the a/b suffix solution described above, but to re-serialize the files with an increment of 2. That way I could continue to look at each page scan and ensure that the page number in the scan was the same as its file name number. Jürgen Brandstetter was so kind to help me writing a small script to rename the files:
declare -i i=1; for file in *.jpg ; do new=$(printf "%04d.jpg" "$i"); mv "$file" "rename/drewes"$new; i=$[$i+2]; done
In this script i defines the starting number of the renaming. The script searches for all the files that end in .jpg and renames them starting with i. In case of the missing page 3 it would have to be for all of the following pages i=5. It is also important to notice that a directory called “rename” needs to present in the image folder. The renaming is done by moving the files into this directory.
I created a simple text document and saved it as script rename_serial_odd.sh on the desktop. Use the Terminal to make that file executable with:
chmod +x rename_serial_odd.sh
You should then use the Terminal to get to the directory in which the files are that you intend to rename and that also include the rename folder. You can then call the script as:
You need to complete this process for both the even and the odd pages. The advantage of this method is that you can always check the filename against the page number of the book. Once you complete the adding of dummy pages and renaming the files, I moved the even and odd pages into one directory. The last step was to use Acrobat to merge all the files into a single PDF.Read More
Together with my father I digitized the German book “Die Chronik der Drewes” by Hans Troebs. It is 2127 pages of part one and two. It has been a major effort to photograph and OCR the whole book. This book is about the family history of the family Drewes all across Germany. Here is the German summary:
Die Chronik der Drewes, Dreves, Drews, Drefs, Dreffs, Drebes, Drebs, Dreps, Drewsen, Drewis, Drevsen, Trebes, Trebs, Troebes, Tröbs, Troebs, Tröps, Tröbus, Trebst, Trübst, Troebst, Trebitz, Tröbitz, Trebesius, Trebus, Trebbus, Trebuß, auch Drees, Drebus, Dröbus, Trebuth, Trebbuth, Tributh, Trips, Treibs, Trebsdorf.
Eingebettet in die allgemeine Geschichte und eingebunden in das Leben ihrer Heimat, mit der Entstehung ihres Familiennamens aus namenloser Zeit plötzlich auftauchen und dann fortleben durch die Jahrhunderte bis zur weitgefächerten Verbreitung in der Gegenwart. Mit Einblicken in die Ortschroniken, Rückblicken auf die früheren Jahrhunderte, mündlichen und schriftlichen Überlieferungen, Stammfolgen, Lebensläufen und Lebensdaten sowie Wappen und Bildern von den Familien, von den Wohnorten, Häusern und Höfen, in ihrer Mannigfaltigkeit erforscht, dargestellt und herausgegeben.
It is very rare book and not even available on the second hand market. So we took the effort to make it available for a small fee.
Askimet is a very excellent tool and it protected me from a whopping 80.000 spam comments on my WordPress based web site. That is for October alone. Those spam comments completely filled up my SQL database beyond the point where I could repair it following these instructions.
It was a classical Catch 22. To empty the database I had to “optimize” it, which does take some additional space. Which I did not have because the database was full. Askimet has just released their 3.0.3 update which might have solved the issue. Or it could have been my webhoster’s support worker I called that finally had mercy with me and hit the optimize button on his side.
In any case, getting from a “Warning: Creating default object from empty value in wp-admin/includes/post.php on line 567″ error to the conclusion that my SQL database is full due to comment spam that Askimet caught was a rather interesting journey. And it only took me two days to figure it out.
I was working on a new version of my Spirograph Automaton and once I fired up the EV3 software on my computer it informed that a new version was available, including a new 1.06H firmware. I installed both and expected my software to work as it did before. I got some very strange sensor readings. One sensor seemed to overwrite the value of the other sensors. Moreover, once I unplugged and reconnected the sensors it sometimes seem to work again. After rebooting the EV3 it sometimes worked and sometimes it did not. These types of intermittent problems are really annoying and it took me more than an afternoon to figure out that it was not my poor programming skills that caused the problem. Eventually I made a video of the problem and contacted the Mindsensors support.
They replied promptly and informed me that:
“The 1.06H EV3 firmware has a bug in its sensor handler. Even when the i2c addresses are changed the error still occurs. It seems the error stops once the program in executed and then the devices are disconnected and reconnected. This will fix the error until you restart your EV3. If this is too much of an inconvenience, you can use the 1.03H EV3 firmware. 1.03H has been tested and works with the rotation test-ms code. We have notified LEGO about the issue. Hopefully a new firmware fix will come in the near future.”
This information was not available on their website and I hope that my post will safe you some time. In case you want to downgrade to the 1.03H EV3 firmware, have a look at this file. To downgrade place the firmware in the folder (on Mac OS X):
/Applications/LEGO MINDSTORMS EV3 Home Edition.app/Contents/MonoBundle/Resources/Firmware
You need to right click on the EV3 software and select “Show Package Contents” to get into the desired directory. The firmware will then show up in the list of the available firmwares in your “Firmware Update” under the Tools menu in the EV3 desktop software.Read More
I work for the University of Canterbury and once I am connected to the university network, I can easily access scientific literature since most publishers authenticate users through their IP address. When I work from home or from off-campus I do not have a university IP address and hence it is harder to get access. Many authors do post their PDF files online these days, but just not enough of them.
There are several ways on how to access all the literature from home. First, you can use the Firepass system, but it does remain rather difficult, in particular if you do not work on Windows or if you do not have the device with you. A much better way is to use the proxy server of the library in combination with Google Scholar.
First, you need to visit Google Scholar and log in with your Google account. The you need to click on “Settings”
Next, you need to select “Library links” and enter the name of your university. In my case this is University of Canterbury. Hit the search button.
Google Scholar will present you with a list of search result and you need to select the right one before you click on “Save”.
When you now perform a search on Google Scholar you will see on your right a link to the full text via your library. Click on it.
You will now be presented with a login screen from you university proxy server. Once you entered you login and password you have direct access to the PDF files.
If you are not a big fan of Google Scholar then you can still use the library proxy server. Simply type in the address bar of your browser: “https://login.ezproxy.canterbury.ac.nz/login?qurl=linkToTheArticle” where linkToTheArticle is the URL of the paper you are after.Read More
Mitchell Adair made an excellent documentary about our Christchurch Brick Show that took place on July 12-13, 2014 here in Christchurch, New Zealand.Read More