Use multiple CPU Cores with your Linux commands — awk, sed, bzip2, grep, wc, etc. | RankFocus

/use-cpu-cores-linux-commands

  • Use multiple CPU Cores with your Linux commands — awk, sed, bzip2, #grep, wc, etc. | RankFocus - Systems and Data
    http://www.rankfocus.com/use-cpu-cores-linux-commands

    Here’s a common problem: You ever want to add up a very large list (hundreds of megabytes) or grep through it, or other kind of operation that is embarrassingly #parallel? Data scientists, I am talking to you. You probably have about four cores or more, but our tried and true tools like grep, bzip2, wc, awk, sed and so forth are singly-threaded and will just use one CPU core. To paraphrase Cartman, “How do I reach these cores”? Let’s use all of our CPU cores on our Linux box by using GNU Parallel and doing a little in-machine map-reduce magic by using all of our cores and using the little-known parameter –pipes (otherwise known as –spreadstdin). Your pleasure is proportional to the number of CPUs, I (...)