I’ve been wanting a way to easily recover a file that is accidentally deleted from one of our websites, either by us or by a client. Also, it would be useful to be able to get back to the state your code was in X number of days ago. For example when the client changes his mind about the current direction you’ve been developing. Source control can offer a solution to some degree, but won’t help you if the client has access to the website and they’ve changed a file. And some shops just don’t use source control for all their projects.

Tape backups also offer a partial solution, I’ve had to pull a file off yesterday’s tape several times. But restoring from tape is a hassle, especially if its stored off site (which it should be!).

Enter rsnapshot. Rsnapshot is a perl script that uses rsync to take snapshots of any set of files you want. Rsnapshot only runs on Linux, but you can easily backup any machine running any OS with it.

It would be fairly easy to code a bash script or windows batch file to copy ‘snapshots’ of your souce code to a backup area. Rsnapshot does more than this – it only backs up files that have changed, while still offering you an interface where you can see all the souce code as it was yesterday, the day before, etc. This results in a very easy way to pull up old files, without taking up a lot of disk space. I’ve looked at other tools that only do incremental backups to save disk space. To use them you must use their special tools to pull the files back out, since all the incremental bits need to be put back together.

Rsnapshot solves this problem by using a feature of the Linux filesystem called hard links. Hard links are kind of like a Windows shortcut, but the ‘shortcut’ appears to be the actual file in every way. The first time a snapshot is taken, all your files are copied to the daily.0 (or weekly.0 if you are only doing weekly, etc.) directory. The second snapshot only copies changed files into the daily.1 directory. For the files that have not changed, rsnapshot creates hard links to the actual files in the daily.0. So when you browse through the daily.1 directory, it appears that all your files are there, even though that directory may only contain a few kb of data, depending on how many files changed.

You can configure rsnapshot to backup on a very flexible schedule. I have mine setup to backup once a day for 7 days, once a week for 4 weeks, and once a month for 3 months. That is, I can look back for 7 days, then after that I only have snapshots of every week for 4 weeks, then after that I only have a snapshot for every month, going back 3 months. Some people like to also backup hourly for 8 hours. Here’s what my backup area looks like:

rsnapshot sceenshot of backup area

Here is how you implement it:

Installation

  • Install rsync on your linux box if you don’t already. Chances are its already on there. Type ‘rsync’ at a shell to see if its installed. If its not, look to your OS documentation on how to install it. On a RedHat or similar os use “yum install rsync”.
  • Install rsnapshot. You can download it from rsnapshot.org. I downloaded the RPM file from there and then installed it with “yum localinstall rsnapshot-1.3.0-1.noarch.rpm”.
  • Install the rsync daemon on the windows machines you want to backup. The windows port of rsync requires Cygwin to run. Cygwin is a windows dll that provides a lot of linux functionality on a windows platform. Many linux-ported-to-windows applications require this. Handily, there is version of rsync for windows that bundles the necessary Cygwin stuff with an rsync implementation – cwRsync. Go to the the cwRsync website and download the cwRsync Server. Its a straight forward windows installer. All my backup transfers are done inside our secure network, so I did not install the OpenSSH part of cwRsync. If want to do snapshots across the internet you should install that component and setup keys so your transfers will be encrypted.
  • Start the cwRsync service – go to the services applet in your control panel (which is located in “Administrative Tools”), and start the cwRsync service. Set it to Automatic so it will startup upon boot.

Configuration

In the Start menu on your Windows machine you’ll find an entry for “cwRsync Server”, and in there is a shortcut to the rsyncd.conf file. In here you need to setup what files are allowed to be accessed through rsync. Rsync calls a group of accessible files a ‘module’. My module is called websites:

[websites]
path = /cygdrive/d/websites/
read only = true
transfer logging = yes

That shares out the D:\websites area. Notice the cygwin naming convention for accessing your drives. You’ll need to restart the cwRsync service after changing this file.

To test your windows rsync setup, run this command on your linux server: “rsync mywindowsserver.mydomain.com::”. This should list the available modules on that machine:

[root@web-dev3 ~]# rsync cf7dev.cfwebtools.com::
websites
[root@web-dev3 ~]#

Next configure the /etc/rsnapshot.conf file on your linux server. Here’s a tip, parameters in this config file must be separated by tabs. This allows you to easily specify spaces in your file paths.

The first thing I changed was the snapshot_root directive, I pointed this to the area on my linux server where I wanted the snapshots to be stored. I put them in an area where we have a samba mapping so any developer can easily browse the snapshots.

Then scroll down to the BACKUP INTERVALS section. Define here the resolution of your snapshots. Here’s what I have:
interval daily 7
interval weekly 4
interval monthly 2

Note that they must be in order of most often to least often, because of the way the hard linking works.

Then in the BACKUP POINTS / SCRIPTS section, define what you want backed up. Here’s my entry for the windows server mentioned above:
backup cf7dev.cfwebtools.com::websites cf7dev

The second parameter is the server and the module, the third parameter is the directory to place the snapshot. This is under the snapshot_root declared earlier.

The last step is to add rsnapshot to cron. You need to call rsnapshot every day with the daily parameter, every week with the weekly parameter, etc. So every day you should run “rsnapshot daily”, every week you should run “rsnapshot weeky”, etc. I just added these lines to my cron.daily, cron.weekly, and cron.monthly files. To test my configuration I just manually ran “rsnapshot weekly”.

Thats it! It takes less than an hour to setup and you’ll have easily accessible snapshots to refer back to when something goes wrong.

37 Comments

  1. Cory says:

    Nicely written article. I came across this page by your comment on my blog, also about backing up computers with rsnapshot.

  2. Kyle Piper says:

    Very nice, when I move to Ubuntu Linux, Ill keep this in mind

  3. Sexy Bern says:

    To be fair to rsnapshot virgins, the following should be noted.

    rsnapshot is not magic. It’s a very well-structured wrapper around “cp -al” and “rsync”.

    rsnapshot uses Linux hard links. If you ever edit any of the files in the “rsnapshot tree”, you will hose all other links to it. Treat your rsnapshot tree as read-only, warts’n’all.

    rsnapshot uses rsync. rsync doesn’t copy changes as such, it synchronises trees. If you delete a load of files in the “source tree” then the corresponding files will be deleted in the “rsnapshot tree”. This won’t affect previous snapshots, only the one that’s in progress.

    If you move or rename a directory in the “source tree” you will break “true” synchronisation at the same point in the “rsnapshot tree” – rsync will delete the sub-tree under the old name and create a new tree under the new name (since you can’t hard link directories). eg.

    foo/bar/(5 gigs of data) -> foo/wibble/(5 gigs of data)

    Here, “bar” was renamed to “wibble” and you lose the benefit of 5 gigs of hard links.

    You can avoid this problem if you know in advance that it’s going to take place. Go into your most-frequently-created rsnapshot (eg. daily.0) and do the corresponding “mv bar wibble” before rsnapshot runs. Nothing will be broken as the “wibble” directory now exists in both places and rsync won’t go through the delete/create phase.

    I’ve used rsnapshot to synchronise trees with literally MILLIONS of files in them. It needs RAM but it works a treat.

  4. David Cantrell says:

    Some of the most common questions people have about rsnapshot are about how to back up Windows machines, so thanks – I’ll link to you from the website shortly 🙂

    One point needs clarification though – rsnapshot doesn’t only run on Linux. We aim to support any operating system and filesystem that supports hard links. That includes *BSD, Solaris, Irix, AIX, HPUX and others.

  5. David Keegel says:

    It is also a good idea to add “hosts allow = …” in rsyncd.conf on the windows machine to restrict the IP addresses which can connect to the rsync server. And possibly also to add “auth users = …” (and probably “secrets file = …”) if you would like user/password authentication. By default rsync server allows anonymous access from anywhere.

    Or for advanced users, you could set rsyncd.conf to have “hosts allow = 127.0.0.1” and access the rsync server through an ssh tunnel (eg from linux run ssh -L 873:localhost:873 cf7dev and then have rsnapshot do a backup of localhost::websites to cf7dev .) That would also mean the network traffic would be encrypted.

  6. LD says:

    Davdi Cantrell,

    NTFS does support hard links and junctions (like soft links). In theory it should be possible to perform a similar function on windows that you do with *nix systems.

  7. Matthew says:

    This solution works great, however is there anyway you can initiate this process from the client being backed-up, instead of vice-a-versa? In my scenario the clients being backed up are behind routers with dynamic ips – hence I do not have a StaticIP to point the host at, nor the ability to always configure the router appropriately.

    ?

  8. Ryan Stille says:

    Matthew – I don’t think so. You can initiate rsync from the client, which is what rsnapshot uses… you may be able to hack it together.

    One option would be to rsync data from the client into a temporary place on your rsync backup server. Then use rsnapshot (ran from the backup server) to snapshot that data, then delete it.

  9. David Keegel says:

    If you are prepared to do some extra work and know what you’re doing, the link below describes a way you can do “push” backups with rsnapshot (in a manner of speaking) :-

    http://lists.samba.org/archive/rsync/2007-December/019470.html

    I haven’t tried it myself.

  10. Terry Barnum says:

    To expand on David Cantrell’s statement that rsnapshot doesn’t only run on Linux, it’s been working great for us on Mac OS X (10.3, 10.4 & 10.5).

  11. Rax says:

    Rsnapshot now seems to be available for Linux as well.. though only in RPM (for the moment), Debian appears to be “Coming soon”

  12. Doug Barry says:

    I have had a lot of success with rsnaphost in this sense myself, having arrived at the same solution via my own investigations. There is a native version of rsync for windows for those of us that would rather not polute a windows server with cygwin, called DeltaCopy. Check it out here: http://www.aboutmyip.com/AboutMyXApp/DeltaCopy.jsp
    I am currently using it with excellent results on a few dozen machines, targeting a Debian embedded linux machine (Linksys NSLU2) and also a more beefy server for the middle man in a DDT setup.

  13. acdc superstar says:

    Works like a charm,

    It took me a while to realize that windoze firewall was well, doing its job… Port 873 needs to be opened.

    Thought I might save some time to others.

  14. Fred says:

    Doug, if I understand well, you can use only the client of DeltaCopy on the Windows machine and rsnapshot on the Linux box? Or is DeltaCopy entirely for Windows machines (no Windows/Linux interoperabilty)?

  15. ryall says:

    Fred, you can use linux as either a server or client using rsync and DeltaCopy

    http://www.aboutmyip.com/AboutMyXApp/DisplayFAQ.do?fid=7

  16. Gabor says:

    Thanks for the article! Here are some important things to add:

    With standard cygwin1.dll filenames with UTF-8 characters, like German “Umlaut”, won’t show up correctly. For that you have to use a modified cygwin1.dll which can be found at http://www.okisoft.co.jp/esc/utf8-cygwin/

    The path of your module must not contain spaces or wildcards. As a workaround you have to use the short name (8.3) which you can see in the Windows shell with “dir /x”. In the include/exclude section you may use a ? for a space. For example:

    path = /cygdrive/c/dokume~1 # (German version)

    exclude = Default?User

    The cwRsync service is run by an user with the same name. This user has to have full access to the files waiting for backup. If you want to backup your user directory in “Documents and Settings” you have to give user “cwRsync” full access to it.

    And, as acdc superstar said above, the Windows firewall needs an exception for rsync.

  17. mike says:

    thanks gabor!!! i was having an issue with delta copy, if my top level folders had spaces, they werent working right. so if there were too top level folders starting with “My ” then they wouldnt work cause they ended up beign the same. using the 8.3 names worked great!

    weird though, everything below the top level, even with spaces, were fine.

    does UTF-8 matter for anything else except for other languages?

  18. Gabor says:

    Mike, I think you don’t need the UTF-8 replacement unless you have some special characters (non ASCII) in your file names. And yes, only the module path, the top level, must not have spaces, everything below is fine.

  19. FlavioB says:

    Hy folks!
    I'm trying to backup a Windows client, but as it tries to copy the folder "Eredità" I get the error:

    rsync: recv_generator: mkdir "/mnt/remote/user01/daily.0/dire/DIVERSI/Eredit\#340" failed: No such file or directory (2)
    *** Skipping any contents from this failed directory ***

    Any clues about what to do?

    Thanks to all!

  20. jens says:

    Hi.

    i followed the instructions and everything worked fine apart from the entry in the BACKUP POINTS / SCRIPTS section. I used the IP address followed by "::" but a rsnapshot configtest says that the source directory doesn't exist. Is there a problem with the IP address? (but the check of the module worked fine!).
    Would be anyone so nice and help me?

  21. Aaron says:

    Thanks for the great writeup, but I can't figure out where to go on the windows box for this instruction

    >>Start the cwRsync service – go to the services applet in your control panel, and start the cwRsync service. Set it to Automatic so it will startup upon boot.

    I can't find any services applet in control panel.  Perhaps I'm not looking in the right place…

    Thanks for your help!

  22. Ryan Stille says:

    Aaron, try looking in "Administrative Tools" in the control panel.

  23. Amjad says:

    Aaron,To start Rsync service, Right Click My Computer – Manage. Then Look for Services and Applications, and right there you will find cwRsync service. Just right click that and click start.

    Hope this helps

  24. Joel Hazel says:

    Hey Ryan – been trying to setup 10.6 SL Server with rsnapshot to backup my windows clients in the office and found your guide.  Setup went well enough, but when I try to run rsnapshot it get the following:

    [code]
    @ERROR: invalid uid nobody
    rsync error: error starting client-server protocol (code 5) at /SourceCache/rsync/rsync-40/rsync/main.c(1398) [receiver=2.6.9]
    @ERROR: invalid uid nobody
    rsync error: error starting client-server protocol (code 5) at /SourceCache/rsync/rsync-40/rsync/main.c(1398) [receiver=2.6.9]
    @ERROR: invalid uid nobody
    rsync error: error starting client-server protocol (code 5) at /SourceCache/rsync/rsync-40/rsync/main.c(1398) [receiver=2.6.9]
    —————————————————————————-
    rsnapshot encountered an error! The program was invoked with these options:
    /opt/local/bin/rsnapshot hourly
    —————————————————————————-
    ERROR: /usr/bin/rsync returned 5 while processing 10.112.0.122::backup
    [/code]

    Not sure what I'm doing wrong… any ideas?

  25. Ryan says:

    Hmm not sure Joel, I haven't seen that before.  it kind of seems like the 'nobody' user might not be setup or not setup properly.  I would see if you can run the rsync commands manually.  Just try to rsync one of your windows shares to a local directory on your linux box. Then take your errors to an rsync forum, once you get those straightened out rsnapshot would work.

  26. hoberion says:

    @Joel Hazel

    had the same issue:
    in the rsyncd.conf on your windows machine add
    uid = 0
    gid = 0

  27. Carpet says:

    Thanks for the HOWTO Ryan! Finally got cwrsync up and pumping. Kudos @acdc superstar for the port settings on Windows Firewall too!

  28. Wayne says:

    I tried using this procedure to backup a Windows system (all disks, all files) but mostly got permission denied error messages.  I was using the Administator uid/gid.  Is it possible to backup entire Windows systems with rsnapshot?

  29. Ephestione says:

    I got some headache while trying to test the cwrsync service from ubuntu, since I wasn't getting a connection timeout.
    Turns out Windows 7's firewall was in the way, you need to add a rule for inbound connections allowing all connections to the …\ICW\bin\rsync.exe program

    And by the way, would you mind if I translated this guide to italian and published it on my website, obviously giving credit to you as the original author? (admitting I find the time to :D)

  30. Ryan says:

    Ephestione, feel free to translate it, thanks for asking.

  31. Neo says:

    I wonder if rsnapshot can work with Mac OSx 10.5.6? I saw all comments above from long long time ago.

    Can I ask a noob question? Can you, who have made rsnapshot, be able to treat the current running system as a root snapshot? Because I don't want to make a root or base-line backup and then make others snapshot base on this base-line. It takes time! If you can treat the current system is the root snapshot, all you have to do is make first snapshot base on current system and then base on the first snapshot and so on so 4. On the Internet shop pc there is no important things to care about except the OS itself. Or my home pc, which for game only doesn't need to a root snapshot for system and also doesn't need to Restart-Restore system (like Deep freeze). I want to restore system when there is some apps do "conflic" the other current on my pc (like Rollback of Horizontal Datasys for Windows only).

    There is an app run on Mac OS x I know very like rsnapshot name, Paragon Volume Snapshot for Mac OS X, but it has a nice GUI and cost some bucks.  

    thanks 😉

  32. Alex says:

    @Neo: I don’t think you’d want to do that. What happens if your hard drive dies, then the hard links fail?

    Hard links only work on the same filesystem. Rsnapshot’ing to the same local filesystem doesn’t really make any sense– you want to have an ENTIRE backup of a point in time (no matter how long it takes), and then hard links in subsequent backups for when files didn’t change between the backup intervals.

    “Backup” is kind of the key part– redundancy! 🙂

  33. Clint O says:

    Hi:

    I’ve used rsnapshot on FreeBSD. I performed purely local backups with it: local disk to local disk. So I was curious if the same could be done entirely with Cygwin on Windows? In other words, is the file system support all there for hard links to make this work?

    Thanks,

    -Clint

  34. Alex Cavnar says:

    I don’t think that’ll work– Windows doesn’t support a notion of hard links, and there’s nothing Cygwin can do about that, unfortunately.

  35. Clint O says:

    Based on a Googling of “Cygwin hard links” it would appear that NTFS supports them (not FAT). So, provided I get beyond that hurdle, am I probably OK?

  36. Alex Cavnar says:

    Looks like I was wrong– my apologies!

    Heck, worst thing you can do is give it a shot. Or, if rsnapshot’s not installed, you can try doing cp -al to see if it does actually copy with hardlinks.

  37. Clint O says:

    Hi, no worries. I was able to apparently create a hard link (omit -s with ln) and the system seemed to support it, so it would seem that this isn’t a technical obstacle. I’ll see if I can download rsnapshot on my Windows 7 machine with cygwin and get it working!