Anthropology News has a great article by Jonathan S Marion and Jerome Crowder on workflow management for anthropologists (or any academics working with a lot of raw data– photos, audio, etc.). After presenting a basic definition of “workflow” for the neophytes, they explain their philosophy:
…which are the easiest recipes to follow? Those with the fewest and least complicated steps, right? But even if they have more steps, recipes that have become almost second-nature via long-term practice and familiarity work best. So too with workflow procedures: they tend to be most useful when they are simple and routine. Developing your preferred workflow may take several iterations, but once you figure out your preferred recipe, you will have a reliable process for organizing and working with your data. And since repetition generates familiarity, following the same procedures each and every time helps ensure the consistency of your desired outcome. You are less likely to forget steps, and if you do make a mistake, it is that much easier to backtrack and identify the source of the error. In any case, your workflow can be as simple or as detailed as you need—but once you find what works for you, stick with it. For us, the key steps to good workflow are: copying; renaming; selection; image treatment; optimization; and backup.
I like this analogy and advice: your workflow should work for you, so find something that feels comfortable enough to stick with it each and every time. In this case, simply making good use of file names (and folders) can go a long way toward organizing your data. And as the authors point out, your OS will search the metadata associated with files, too, even if it’s in a field that isn’t normally displayed in the ‘details’ view.
In my own case, the ‘data’ I have the greatest number of–and the most trouble organizing and searching–are PDFs of academic articles, as well as chapter excerpts and other texts that have been scanned into PDF form. I’ve recently (belatedly) begun adding these resources to a BibTeX file (see the ‘bibliography’ tags for more on this process), but fortunately I adopted a consistent naming scheme for these files from the very beginning of grad school. I always use the following format:
Lastname First Initial – Year of publication – Article title [sometimes skipping ahead to the ‘subtitle’ that actually mentions something about the topic — we anthropologists often start off with a pithy quote or phrase that provides color and interest, but isn’t much use when you’re trying to remember the name of an article you read]
So for example, after downloading the article “The Semiotics of Collective Memories” by Brigittine French, I renamed the PDF as such:
French B – 2012 – The Semiotics of Collective Memories.PDF
I also like to take notes on interesting articles; while I may mark up the PDF and leave some notes in the margins, for longer observations or clipped text, I prefer creating a Word document. I always give these documents the exact same name as the source text, and save them to the same folder — thus the article and its corresponding notes show up right next to each other when I organize by file name.
A couple of important caveats I should mention: in Windows (and probably in other OS’s as well), the total number of characters in a file’s path cannot exceed 260 characters. Path includes not only the name of the file itself, but also the names of all the folders between the file and the root of the drive, e.g. C:\. If you’d like to be able to use longer file names to include more of the articles’ titles, you should save the PDFs in a folder close to the root. In my case, D:\Dropbox\Academia\Resources\ houses all of my PDFs. I think that’s about 27 characters from the root, so I still have well over 200 left for my file names.
Also, if you’re working in a Mac environment, be cautious about using characters that are permitted in OS X but are illegal in Windows and Unix / Linux environments. For example, file names in Windows can’t include a question mark (?) or quotation marks (“).
One improvement I could make, based on the workflow philosophy linked above, would be to save unmarked versions of the PDFs alongside the ones that I read and annotate. If I ever decide to assign the readings for a course or share them with a colleague, it would be convenient if I had a vanilla copy to send along that didn’t include my highlighting and scribbles in the margins. Between my computers and tablet and phone, I end up reading and marking up PDFs using several different programs–Foxit, ezPDF, etc.–and I haven’t found a way to remove all of the annotations and markup that they leave behind, so I usually end up having to re-download the original PDF all over again.
I also rely primarily on file naming conventions to manage digital information from my dissertation fieldwork– photos, videos, audio files, and their corresponding transcripts, among other ‘objects’. I created a ‘Fieldwork‘ folder with subfolders for each of the major classes of data–Audio, Video, Photos. The naming conventions differ a bit for each of these. Since the photos vastly outnumber the other types of files, I have them organized into dozens of folders, primarily on a chronological basis. My ‘Photos‘ folder is subdivided into folders for each year, 2006-2010, and each of these folders is divided into the months that I was in Guatemala during that year. Beyond the month level, folders are organized thematically–usually around whatever event or series of events the photos correspond to. So for example, at “Photos/2010/6-June 2010/Post-Agatha Quiché and picnic” you’ll find the dozens of photos I took while surveying the destruction from Tropical Storm Agatha with some friends in Quiché, as well as our picnic afterwards. Rather than renaming the individual files with date or location information like Marion and Crowder recommended, I preferred to give them all the same name as the folder, followed by a number. Actually, to be fair, that’s how my photo software named them in the process of downloading them from my camera. But I prefer it because it looks neat, and a lot of the other information is already recorded in the file meta data. In Windows 7, just right-click on an image file and select Properties, then hit the Details tab, and you can see a lot of additional information that your camera automatically recorded and embedded in the file. Assuming you keep the time and date updated on your camera and other recording devices, you’ll have that information handy, with no extra effort on your part. Newer cameras will also record details like the ISO, F-stop, exposure time, etc. for all of the photos you took. Even my inexpensive GE point-and-shoot recorded a lot of useful details that I now know how to interpret, thanks to Katherine Fultz’s photographic methods 101 workshop (link forthcoming).
Moving on to Audio and Video, each of these folders contains all of the corresponding media files. The file names begin with a counter, starting with file 000, and then a short identifying label, followed by the date and time, and sometimes the location–whatever information I need to recognize the event and differentiate it from others. If I later edit a file and create an excerpt, I’ll save it with the same identifying number as the original, followed by a letter (similar to citing multiple articles written by someone in the same year). For example, audio file 168 – Ceremony at G’umarkaj – 14 July 2011.MP3 is followed by 168a – Interview with Aj q’ij – 14 July 2011.MP3, which is a 20-minute excerpt from the original file that I created just for convenience. If you have long audio clips that require a lot of cleaning up, or if different time periods in a long clip seem like they could use different types of cleaning, then it would probably be useful to export them as excerpts and apply the post-processing to each one, individually. Or if you have the (cloud) space, you could follow Marion and Crowder’s advice and create backups at each step when you modify files.
One last tip and I’ll wrap up this long post. If you ever decide to rename a lot of files in bulk, the free Windows program Bulk Rename Utility has pretty much every option you could possibly want, including support for regular expressions. However, as the screenshot below should indicate, it also has a pretty steep learning curve.
In Windows, there’s a quick and easy way to rename a batch of files, if you’re satisfied with giving them all a shared name plus a unique trailing number. Just select them all, then right-click and select ‘Rename,’ and type in the name you want them all to share. When you finish, hit enter and voila! They will all be renamed and numbered.