Continuations: Machine Creativity: Possibly Sooner than Anticipated

AlphaGo has won its series in the game Go against grandmaster Lee Sedol 4-1. I wrote an initial post about AlphaGo after its first victory against a lesser ranked player. Humans have very big brains compared to the neural networks used by the program which shows that humans are unlikely to be able to use much of their brain for any one specific task. This, combined with the ability to run machine networks fast and against a lot of training data will make this technology formidable for many tasks.

Many people have been claiming that creativity will be one area in which machines will not be competitive with humans any time soon. But it is not clear that this is true. Creativity is related to the process of conjecture. Every new design, new text, new scientific theory, etc is a conjecture of a possible future state. The human brain is very good at coming up with such conjectures.

But here too we should notice something: if you want to come up with a new architectural design it helps to have learned a lot of existing designs. Einstein read a lot of the work of other physicists. Put differently the first step in creativity and conjecture seems to be observation and training of a network based on that.

Read the full article: Machine Creativity: Possibly Sooner than Anticipated on Continuations.com

See also:

Turn your old Raspberry Pi into an automatic backup server

Mirrored from OpenSource.com under a Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license

Turn your old Raspberry Pi into an automatic backup server

Posted March 16, 2016 by Seth Kenlon

If you’re one of those people upgrading to the Raspberry Pi 3, you might wonder what to do with your old, lesser Pi. Aside from turning it into an array of blinking LEDs to entertain your cat, you might consider configuring it as a microcontroller.

Making backups of our digital lives is, as most of us begrudgingly admit, the most important thing of daily computing that none of us bother to do. That’s because going through the backup process requires us to remember to do it, it takes effort, and it takes time. And that’s precisely why the best backup solution is the solution that you don’t do at all; it’s the one you automate.

Such a system is best when it’s always on, running in the background. And that’s exactly what a Raspberry Pi is best at. You can leave the Pi on all day and all night and never notice it on your power bill, and you can task it with the simple activity of running backups across your home network. All you need is a Raspberry Pi and a big hard drive and you have built, essentially, a custom version of those annoying “easy backup” systems that hard drive companies come out with every few years (you know the ones? the ones you hook up to your network, waste a weekend trying to configure only to discover in a hidden online forum that nothing works as advertised due to a bug in the firmware, which the hard drive company promised they’ll fix “soon” two years ago).

rdiff-backup

First, you need to choose some backup software to have your backup server (your Pi) and your clients (your laptop, desktop, and whatever else) run.

There are several tools for auto backups, but I’ve found over the years that most of the nice slick graphical backup solutions end up falling out of maintenance until they fade away, forcing me to switch to something different. That gets annoying after a while, so I started using rsync, the venerable old UNIX command that’s been around for decades. This served me quite well, but I started finding myself wanting versioned backups of certain files; rsync does a backup for files that have changed, but it overwrites the old version with the new, so if my problem isn’t that a file has been deleted but that I’ve messed up a file beyond recognition, then having rsync’d backup files don’t do me a bit of good, because the backup almost always ends up being the bad version of a file that I was looking to replace.

Then I found rdiff-backup, a simple backup tool based on rsync (it uses librsync), and thereby inheriting its reliability (it has, however, only been around since 2001, so it doesn’t have quite the history that rsync has). Rdiff-backup performs incremental backups locally or over a network using standard UNIX tools (tar,rdiff, rsync, and so on), so even if it does fade away, the backup files it creates are still useful. It’s lightweight and runs on both Linux and FreeBSD, so it’s trivial to run even on the oldest Raspberry Pi.

Server install

You don’t need any special setup to turn your Raspberry Pi into a backup server. Assuming your Pi is up and running, all you need to do is install rdiff-backup from your repository, ports, or extras site.

Client install

As for your clients (that is, the computers that are going to get backed up by your Pi), rdiff-backup can be run on Linux, BSD, Windows, and Mac OS X, so chances are you can use this for all the computers running in your home.

The big hard drive

Even a 64GB SD card isn’t going to go very far for incremental backups, so you’ll need a big hard drive to hook up to your Pi. You know your own data best, so let that be your guide when shopping for a drive. For my home network, I have a relatively small (given the number of multimedia data files I work with) 3TB drive; I do that for a number of reasons, but primarily because I don’t actually back up all of the data I own. A lot of data I work with exists elsewhere anyway, so there’s no need for me to back it up, and things like my music and movie collection I don’t consider vital enough to backup, either. So don’t feel like you have to literally keep track of every last kilobyte; just get to know your data and what matters to you most.

Once you’ve got the hard drive, hook it up to your Pi and format it. Strictly speaking, you may not absolutely have to format it, but if you’re going to have Linux manage the data then you may as well store the data on a native filesystem. This assumes that your backup drive is either new or a drive you want to wipe completely. If not, you can skip this part.

To format a drive on Linux, you must use root permissions. It somewhat depends on what distribution you are running on your Pi (Raspbian, Pidora, and so on), but usually the sudo command is the way to invoke this. No matter what, the tool to use is parted, and as long as you have no other drives attached to your Pi (aside from the SD card it has booted from), then the location of your drive is /dev/sda. For safety, I’ll use /dev/sdx just to avoid potential copy-paste mishaps.

First, confirm the location of your drive:

Then run parted on the drive to confirm its total size:

Look at the line that starts with Disk; this gives you the total size of the drive in megabytes. Jot that down somewhere, because you’ll need it in a moment.

Next, create a new partition on the drive, spanning the entire drive. Only do this if you want to wipe the backup drive completely to make room for all your backups. If there is any data on the drive that you do not want to disappear forever, then do not do this.

Your fresh partition exists now, so create a filesystem inside of it. Note that for this command, you use the partition rather than the disk location. So instead of/dev/sda, for example, you would use /dev/sda1. For best results, also provide the disk with a label (the -L option), which we will use later to auto-mount the drive.

Your drive is now ready for its life as a backup drive.

Auto-mounting the backup drive

The idea of using a Pi for your backup server is, in part, that it’ll always be on. But if something does happen (a power failure, for example, or accidental shutdown) then you want your backup drive to be re-mounted automatically or else any attempt to backup will fail.

To setup auto-mounting for your drive, first create a standard location for it to be mounted. Drives are usually mounted to locations like /media or /run/media, which is fine, but for simplicity just create a directory for it at the root of your filesystem:

And then edit /etc/fstab with root privileges in the text editor of your choice. Add this line:

And finally mount the drive:

The initial backup

The first backup you do is the largest and slowest backup because everything that you want backed up is getting copied to your drive. Subsequent backups are much smaller and faster because only new files (or blobs) or changes to files get copied over.

First, install rdiff-backup on the client computer (the one to be backed up to the Pi). It’s available for the major operating systems.

To make sure that your future backups go as expected, make your first backup using the same command and same setup that you intend to use for the incremental backups. That means you shouldn’t disconnect the big drive from the Pi and plug it into the client so that it goes faster; perform every backup the same way every time, so that you know exactly how to automate it later.

On the Pi, make a directory for the folder you are about to backup from your client. Assuming you want to backup the client’s home directory, create the a mirror of that folder on the backup drive:

And then make sure that the same user owns the directory:

This assumes that user seth exists both on the client and on the Pi. You don’t have to do it that way (rdiff-backup can sign into the Pi as a different user), but it sometimes makes it easier to manage when the backups are mirrors of the source.

This also assumes that you are backing up your home directory. That’s usually a good place to start (I assume that if you’re running Linux, then you can download and replace the base system for free), but you might want to leave out large files that you don’t need to backup. List files and folders to exclude from backups in a file called .excludes in your home directory. At the very least, you can probably safely exclude your trash directory:

The basic rdiff-backup command from your client computer, where 192.168.3.14 is the IP address of your Pi:

That command should kick off a lengthy rsync process in which all files are discovered to not exist on the backup drive, and therefore are copied from the client to the Pi. If it failed, check the permissions involved; your user (on the Pi) must be able to write to the backup drive. Also, your user must be able to successfully SSH into the Pi remotely.

Auto login

Since our aim is to automate this process, the login process that kicks off backups must also happen without intervention. It’s easy to make SSH login automatic; just use ssh key login. This can be done as a single step with ssh-copy-id, which should be in your Pi distro’s repository). To use a special key just for this backup server, use the ssh config file to specify what key to use.

Cron job

Assuming everything has worked so far, there’s no reason an unattended backup should fail. To make that happen, take the same command you used for the initial backup and assign it to a cronjob. This is generally done with the commandcronjob -e:

That cronjob runs the backup command every six hours (on the hour). You can adjust the frequency according to your needs.

Restore data

Now that the backup has been automated, there’s only one command you actually need to remember: how to restore a file from the backups you are so dutifully making.

The simplest restore command is as simple as an rsync or scp:

This command restores from the backup server the most recent version oftux.svg to the same path on your client machine. Notice that you don’t have to worry about special file paths to account for versions; if you want the most recent version, you just restore the same path that is missing or that you have corrupted, and let rdiff-backup resolves that request to the most recent version.

But the –restore-as-of option is more flexible than that. Maybe the version of the file you need is from five days ago:

There are several other means of restoring files, and they’re all listed in the officialrdiff-backup documentation, but in practice I have found that the –restore-as-ofoption is the one that gets used most often. In the less common circumstances that you know the exact day and time of the last good version of a file and need to pull it very specifically from your backups, rdiff-backup handles that for you too; you just have to get the rather unwieldy diff filename, stored alongside your backup data on the backup drive.

For example:

This restores the file paint from the backup performed at 6:06 a.m. on January 24. It does not place, of course, just the diff data of that file into your home directory, but a fully reconstructed version of the file. That’s what rdiff-backup is for.

Back it up

Backing up is important, and your old Pi can help. Set it up today and you won’t be sorry.

Review of “A Machine Learning Approach to Recognizing Acronyms and their Expansions”

DRAFT

Article Title: A Machine Learning Approach to Recognizing Acronyms and their Expansions

URL: http://research.microsoft.com/en-us/people/junxu/acronymextraction-icmlc2005.pdf

Mirror URL: https://scottontechnology.com/wp-content/uploads/2016/03/acronymextraction-icmlc2005.pdf

Authors: Jun XuYa-Lou Huang

Keywords: Acronym extraction; expansion; text mining; machine learning

Scott on Technology classifications
Reproducible research: No
Additional Keywords: Support Vector Machines, SVM
Programming language used: Unknown
Number of pages: 6

It is a decent overview paper, but lacking in sufficient details to implement. Generally, the approach uses a rules based approach to identify likely acronyms and candidate expansions and uses support vector machines (SVM) for “selecting genuine expansions from candidates.”

Sections 1-3 are the introduction, related works, and observations on “recognizing acronyms and expansions from text.”

Sections in detail

4.1 Identify Likely Acronyms
The overall process here is clear.  Interestingly, an observation they made earlier stated that, “acronyms are generally three to ten characters in length…” yet allow the likely acronym length to be between 2 to 10 characters.  Also for Step 3, they don’t indicate which dictionary they used or how they determine person name or location name. I also found it odd that they check the acronym candidate against an additional stop word list as the stop words should already be in the dictionary and I don’t know what purpose it serves in this step.  However, I have found that ignoring stop words is an important in a pattern based expansion generation step.

4.2 Generate Candidate Expansions
“We observed that expansions always occur in surrounding text where acronyms appear in and always in the same sentence.”  I have found this to be not true.  For example the following text:

During World War II, a number of Army personnel were stationed at the Orlando Army Air Base and nearby Pinecastle Army Air Field. Some of these servicemen stayed in Orlando to settle and raise families. In 1956 the aerospace and defense company Martin Marietta (now Lockheed Martin) established a plant in the city. Orlando AAB and Pinecastle AAF were transferred to the United States Air Force in 1947 when it became a separate service and were re-designated as air force bases (AFB).

Army Air Field and AAF are not near each other and not within the same sentence. If measured from after the “d” in field and up to the “A” in AAF, there are 215 characters between them.  A more relaxed statement is that the  expansions usually appear in the same paragraph as the acronyms.

4.3.2 Features

“lower case[sic], numeric, special characters, and white spaces…” doesn’t specify which factors are binary or real valued.

 

References

  1. Adar, E.: SaRAD: a Simple and Robust Abbreviation Dictionary. HP Laboratories Technical Report, 2004.
  2. Bowden, P. R.: Automatic Glossary Construction for Technical Papers, Nottingham Tient University, Department Working Paper, December 1999.
  3. Bowderi, P. R, Halstead, P, and Rose, T. G. Dictionaryless English Plural Noun Singularisation Using a Corpus-Based List of Irregular Forms, in: Ljung M., ed. In Proc. of 17th Int. Conf. on English Language Research on Computerized Corpora, Rodopi, Amsterdam, Netherlands. pp. 130-137.
  4. Chang., J. T., Schutze H., and Altman R. R.: Creating an Online Dictionary of Abbreviations from MEDLINE, J. Am. Med. Inform.. Assoc., 9.
  5. Hettich, S. and Bay, S. D.: The UCI KDD Archive. Irvine, CA: University of California, Department of Information and Computer Science.
  6. Larkey. L. S., Ogilvie, P.. Price, M. A., and Tamilio, B.: Acrophile: An automated acronym extractor and server. In the Proc. of 5th ACM Conf on Digital Libraries. San Antonio, TX: ACM Press, 2000.
  7. Park, Y., and Byrd, R J.: Hybrid text mining for finding abbreviations and their definitions. In Proc. of EMNLP, 2001.
  8. Pustejovsky J., Castano J., Cochran B., Kotecki M., and Morrell M.: Automatic Extraction of Acronym-meaning pairs from MEDLINE databases. In Proc. of Medinfo, 2001.
  9. Schwartz, A. S., and Hearst, M. A.: A simple algorithm for identifying abbreviation definitions in biomedical text. In Proc. of the Pacific Symposium on Bio-computing, 2003.
  10. Taghva, K. and Gilbreth, J.: Recognizing Acronyms and their Definitions. Information Science Research Institute, University of Nevada, Technical Repon TR 95-03.
  11. Vapnik, V.N.: The Nature of Statistical Learning Theory. by VN Vapnik. Berlin: Springer-Verlag, 1995
  12. Yeates, S.: Automatic Extraction of Acronyms from Text. In Proc. of the Fourth New Zealand Computer Science Research Students’ Conference, 1999
  13. Yeates, S., Bainbridge, D., and Witten, I.H.: Using compression to identify acronyms in text. In Proc. of Data Compression Conf., IEEE Press, New York, NY, p. 582.
  14. Yoshida, M., Fukuda, K., and Takagi, T.: PNAD-CSS: a workbench for constructing a protein name abbreviation dictionary, Bioinformatics, 16, pp. 169-175.
  15. Yu, H., Hripcsak, G., and Friedman, C.: Mapping abbreviations to full forms in biomedical articles. J Am Med Inform Assoc, 2002.
  16. Acronym Finder: http://www.acronymfinder.com/
  17. The Canonical Abbreviation/Acronym List: http://www.astro.umd.edu/~marshall/abbrev.html (dead link) Current link: http://marshall.freeshell.org/abbrev.html

Notes/Edits

  • Amsterdam is misspelled in the References and corrected here
  • The link to The Canonical Abbreviation/Acronym List was dead and a current link is provided
  • Some of the information presented here was generated through OCR methods.  If you see any errors just drop me a note or add to the comments and we’ll get it corrected.

Takeaways:

  • Use of the term “candidate” for potential expansions
  • Use of the term “token” (this is common in the SVM/ML domain)

Interesting features to examine:

  • Length of acronym
  • Length of candidate expansion
  • Expansion distance from acronym

Sources:

Proc. – Proceedings