The PaSh project – Taking the Unix philosophy one step further

The PaSh project gives your POSIX script some superpowers using parallelization to speed up execution times. This leads to faster results for data scientists, engineers, biologists, economists, administrators and programmers.

I remember the days when the saying was “Learn Perl so you don’t have to learn the Shell and its hundreds of utilities.”
Fast forward a few decades and the use of shell scripts has still not been eradicated. Rather, their use has increased due to the rise of containers, virtual machines, cloud administration, and Linux itself.

It also serves as a lesson for those who hasten to denounce technologies as “dead”. There comes a time when a new use case revitalizes old technology.

So what do we mean by “Unix philosophy”? It’s about taking simple, high-quality components and combining them intelligently to achieve a complex result. An example that encapsulates this notion comes straight from the PaSh documentation and shows how you can use many utilities, pipes, and redirects to combine and filter them, in order to achieve the desired result:

Consider the following spell check script, applied to two large markdown files and

cat |
tr A-Z a-z |
tr -cs A-Za-z 'n' |
sort |
uniq |
comm -13 dict. txt - > out
cat out | wc -l | sed 's/$/ mispelled words!/'

The speed of an operation like this depends on the size of the two files. It may take from a few seconds to a few minutes. What if you could speed it up by breaking it down into chunks that would work in parallel, and then combine their results? You can.

PaSh is one such POSIX shell script parallelization system, which can improve performance by orders of magnitude. Given a shell script, PaSh converts it to a data flow graph, performs a series of semantics – preserving program transformations that expose parallelism, and then converts the data flow graph back to a POSIX script. The new parallel script has POSIX constructs added to explicitly guide parallelism, coupled with Unix runtime primitives provided by PaSh to address performance and accuracy issues.

For example, the above script executed from Pash with -w 2 i.e. 2x-parallelism would create 2 pipes which it would then execute in parallel. Therefore, the data flow graph would look like:

You could say that, there is also GNU Parallel for that. The problem with Parallel is that it doesn’t know the semantics of commands like grep, so it’s hard to use. The user must write a carefully parameterized command for these tools to parallelize a job, while some commands also have ad hoc custom parallel flags like -j, –jobs, –parallel. They are all different, difficult to use and difficult to compose.

PaSh instead has a compiler that works like this:

  • Between a shell script and command annotations
  • Construct a data flow graph
  • Do the graphic transformations
  • Produce a new shell script with low level parallelism & and wait
  • Generate a new shell script with parallelism

Because PaSh is a source-to-source compiler, it allows the optimized shell script to be inspected and executed using the same tools, in the same environment, and with the same data as the original script.

The other two main components of PaSh are annotations, a lightweight annotation language that allows command developers to express key parallelizability properties on their commands, and a small runtime library that provides the PaSh compiler with high performance primitives and supporting its key functions.

Various references on common Unix one-liners show a performance improvement of magnitude 60.

PaSh can be run on Ubuntu, Fedora, Debian, and Arch. Use one of the following methods to configure it:

  • To run curl | sh from your terminal,
  • Clone the repository and run ./scripts/; ./scripts/,
  • Get a Docker container by running docker pull binpash/pash-18.04, Where
  • Create a Docker container from scratch.

And on Windows WSL too.

More information

PaSh: shell processing parallel to data by light contact

Pash on GitHub

Related Articles

The Linux Perfection Challenge

Three Tips for the Linux Shell Addict

To be informed of new articles on I Programmer, subscribe to our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.




or send your comment to: [email protected]

About Leslie Schwartz

Check Also

Helical Network announces the release of the Empowering book, “Natural Philosophy”.

“For the propagation of life, beyond the life expectancy of the Earth.” – Helical LOS …

Leave a Reply

Your email address will not be published.