Graphs at Work. At school. And in other places, too.
My better half teaches further mathematics for the International Baccalaureate (IB) program at a nearby school. I had a previous encounter with their Math club, on the topic of “Math at work”. Back then, work was focused on the roll-out of Scrum at scale, so I touched on Fibonacci numbers (used for effort estimation) and scratched the surface of queuing theory, M/M/1 queues in particular, to model service time in a work queue.Fast forward a month ago, the further mathematics class completed a healthy introduction to #graph Theory, including Dijkstra’s algorithm and the traveling salesman problem. Students remembered the “Math at work” session and asked for a sequel on “Graphs at work”. Based on conversations at home, I fully expected wickedly smart kids to wander in that class, so this was (...)
“Ori is a distributed file system built for offline operation and empowers the user with control over synchronization operations and conflict resolution. We provide history through light weight snapshots and allow users to verify the history has not been tampered with. Through the use of replication instances can be resilient and recover damaged data from other nodes.”
Not tested but, from the paper, I specially like the ability to have work offline and then background synch once you are back online.
Par ailleurs Git-annex vient d’être complété par une « special remote » pour Tahoe-LAFS
Et il est possible d’ajouter un special remote pour d’autre support (notamment grâce au « external special remote protocol »)
Quelqu’un semble être en train d’écrire une « special remote » pour Hubic
quand on a un vieux #filesystem avec des noms de fichiers en iso-latin, ça passe mal quand on le #rsync vers un nouveau filesystem ; pour cela si on peut renommer les fichiers sur la source, utiliser #convmv :
# convmv -r . -f iso-8859-1 -t utf8 —notest
Replication of data three times is a robust guard against loss of data due to uncorrelated node failures. It is unlikely Yahoo! has ever lost a block in this way; for a large cluster, the probability of losing a block during one year is less than 0.005. The key understanding is that about 0.8 percent of nodes fail each month. (...) The probability of several nodes failing within two minutes such that all replicas of some block are lost is indeed small.
Correlated failure of nodes is a different threat. The most commonly observed fault in this regard is the failure of a rack or core switch. (...) If the loss of power spans racks, it is likely that some blocks will become unavailable. But restoring power may not be a remedy because one-half to one percent of the nodes will not survive a full power-on restart. Statistically, and in practice, a large cluster will lose a handful of blocks during a power-on restart.
In addition to total failures of nodes, stored data can be corrupted or lost. The block scanner scans all blocks in a large cluster each fortnight and finds about 20 bad replicas in the process. Bad replicas are replaced as they are discovered.