Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functions to handle directories #223

Open
marcopaganini opened this issue Jan 29, 2025 · 10 comments
Open

Functions to handle directories #223

marcopaganini opened this issue Jan 29, 2025 · 10 comments

Comments

@marcopaganini
Copy link

Hello!

After working with script for a while (nice work btw!) I miss some basic functions that I eventually implemented (in a quick and dirty way) for myself.

Most of them have to do with directory manipulation, like "list of files in a directory", "list of directories in a directory" as well as "by date" variants which are super useful when we need, for instance, to find the N newest files on a dir?

Would there be interest in PRs with these?

@bitfield
Copy link
Owner

Hi @marcopaganini! Thanks for opening the issue. Could you show an example program here which you wrote using one or more of your proposed functions? It'll be very useful to be able to see them in context.

@marcopaganini
Copy link
Author

Hello!

Yes, sure.

Right now, it's not in integrated with the script way of doing things at all. Naturally a PR would have it done better. This was quick hack that I had to do for a simple tool that looks at the last file in a list of files and then looks for a string inside this file (it basically checks if my backups succeeded).

package main

import (
	"errors"
	"fmt"
	"os"
	"path/filepath"
	"strings"

	"github.com/bitfield/script"
)

func backupStatus() error {
	var err error

	if len(os.Args) != 1 {
		return errors.New("use: backup-status")
	}
	logroot := "/var/log/netbackup"

	// The directory structure for logs is:
	// /var/log/netbackup / backup-names... / backup-files [.gz]
	logdirs, err := listDirs(logroot)
	if err != nil {
		return err
	}
	for _, logdir := range logdirs {
		logfiles, err := listFilesByDate(filepath.Join(logroot, logdir.Name()))
		if err != nil {
			return err
		}
		if len(logfiles) == 0 {
			continue
		}
		latestFile := logfiles[len(logfiles)-1].Name()
		latestPath := filepath.Join(logroot, logdir.Name(), latestFile)

		var out string

		// Gzip decompress on the fly if needed.
		if strings.HasSuffix(latestFile, ".gz") {
			out, err = script.File(latestPath).Filter(gunzip).Last(10).String()
		} else {
			out, err = script.File(latestPath).Last(10).String()
		}
		if err != nil {
			return err
		}
		// Results
		if strings.Contains(out, "Backup Result: Success") {
			fmt.Println("✅", latestFile)
		} else {
			fmt.Println("❌", latestFile)
			fmt.Println(out)
		}
	}
	return nil
}

@bitfield
Copy link
Owner

bitfield commented Feb 3, 2025

Great, thanks! Would you now like to try rewriting this program using your proposed script functions, and show how they would work, and how much shorter and clearer it makes the calling code? That'll help us to nail down what specifically is proposed, and also make the case for why it's needed in script.

@bitfield
Copy link
Owner

@marcopaganini just checking if you're still interested in this.

@marcopaganini
Copy link
Author

Yes! Sorry just stuck with work. Will get back to this as soon as possible.

@marcopaganini
Copy link
Author

OK, found a few minutes with my head out of the water, so I can comment. :)

I've been thinking about which approach would be the most flexible and "pipe compatible". What I did in my original code was to create an ad-hoc function called listFilesByDate, but I don't think that's very much in the spirit of the library in general.

To list files in a directory (without recursing) the library already provides ListFiles, but like I said, I (at least) commonly need to list files by date (ls -tr) or even by size (ls -rS). I think for those uses, a function that reads the Pipe and runs a stat() on every single file and sorts by mtime (ascending or descending) could be useful. To fetch the newest file in a directory, we'd do something like:

f := script.ListFiles("/foo/bar").SortbyTime()
fmt.Println("Oldest file = %s\n", f[0])
fmt.Println("Newest file = %s\n", f[len(f) - 1])

Pros:

  • This is compatible with ListFiles.
  • It would be trivial to implement ListDirs which also would be compatible with it.
  • It should be compatible with FindFiles.
  • Most of the code could be used for something like SortBySize which is also useful.

Cons:

  • It's somewhat expensive, as it needs to run stat on every single file or directory in the list.

Please let me know what you think.
Regards

@bitfield
Copy link
Owner

This would be a neat thing to be able to do, but to do it properly we really need some concept of structured records rather than plain strings. For example, like Nushell.

If we had that, then you could generate tables of all kinds of data, not just files, and sort and query it by all sorts of attributes, not just time. This is something I've been thinking about for a while and it's such a major API change that it would really have to be a script/v2, or just a differently-named package. But I think very much worth doing.

@marcopaganini
Copy link
Author

I didn't know nushell. That's actually quite interesting, thanks.

For script, one option is to assume the input to something like SortBySize to contain files. Of course, it's up to the programmer to make sure that's the case (just like in shell, BTW).

Regarding changes to support structured data, as long as people don't peek inside Pipe, what prevents it from happening now? A V2 shouldn't be needed because it won't change the interface with the user. Basically, ListFiles would populate a structure (or even store a JSON string that can be parsed in multiple ways) inside the structure. Things like SortBySize would be smart enough to read the right metadata and produce the correct output, but "pre-structured" tools could still work as they work today (it would be even possible to fill in the metadata in addition to the existing unstructured data).

@bitfield
Copy link
Owner

A JSON string is a neat idea, since we already have JQ.

@bitfield
Copy link
Owner

One problem with sorting in general is that it doesn't work very well with asynchronous pipelines. In order to sort by anything, a pipe stage has to read its entire input—and that might take arbitrarily long, depending on what is upstream. Not an insuperable obstacle, but just worth noting—we don't even have Sort for strings at the moment.

Why don't you try prototyping this with a custom FilterLine function and see how it cleans up your example program?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants