Saturday, March 22, 2008

meta metadata data

Can you have a conversation about metadata without discussing information retrieval? For that matter, can you talk about information retrieval without discussing metadata? Prolly not in a modern library context since they're so intertwined - so let's not waste our time - Let's talk about both (this may take two days)!

Metadata - data about data - sounds redundant on the outside but it is, and has always been, at the core of what libraries do. Let's consider the old card catalogue for a moment. Those little cards had information (metadata) about books; author, title, Dewey decimal number, publisher, etc. It was data about....data. And each was a point of access, a way to find the book you wanted amongst the thousands on the shelves (this is important when we talk about information retrieval in a bit).

So why should it be such a stretch to think of metadata as any thing other than what it is? information about a book; A way to find the book you're looking for. And in today's electronic world, it's a way to find a book anywhere in the world a la the web! People seem to struggle with it for two reasons, I think.

1. because it's called something fancy sounding - metadata - a name synonymous with nothing in layman terms (the word information is infinitely more universal) and

2. the electronic nature of metadata, or schemas under which metadata is generally ruled, is relatively new (in the big scope of things) and ever changing. It's damn hard to keep up with if you're not knee deep in the conversation all the time. To be perfectly honest, it's hard even if you are - it changes a lot, standards are in flux, and the language is not always intuitive.

Plus, there are so many different kinds of metadata collected and preserved for so many different reasons that knowing one schema does not guarantee across the board understanding. It's been my experience that the more traditional librarians seem to struggle with the concept more than younger people who are used to navigating in the computer world. New monikers aren't as likely to scare the bejeezus out of them, where as traditional librarians are feeling the technological crunch more and more and any new ingredient can increase anxiety. (Obviously, a topic for another discussion, anyway...)

METS, MODS, Dublin Core, DACS, TEI, EAD, the list goes on - they're all schemas, or standards, meant to preserve a set of structural, administrative, and preservation assets about an object, whether it be analog or digital in nature. These schemas are generally aimed at particular kinds of collections.

METS, for instance, is used with the Library of Congress' National Digital Newspaper Program. Written in XML (eXtensible markup language), METS acts as a wrapper (don't you love how real life translates into the virtual realm?) in that - it harbors a lot of information concerning a large group of files (in the case of NDNP, a large group of containers full of files) and arranges it in a meaningful way so that many things about the files (objects) can be read and understood, in whole or in part, without ever opening the files. Take, for example, newspaper images; things like title, publisher, microfilm date and microfilming agency, page number, issue number, date of publication, reel sequence number, etc. etc. are all collected and stored in the METS files, either with the reel, the issue, or the image. Too, the images are themselves embedded with certain metadata like file format, scanner and software version used to capture the image, and more. Plus, different file formats are capable of being embedded with more or less metadata. DACS may collect and arrange information about archival collections in a similar or different fashion based on the DACS guidelines from an object level, collection level, folder level, any number of ways one might choose to get at the collection(s). Each schema is designed for the kinds of collections they aim to describe, preserve, and access.

It should be noted that using XML to markup the metadata goes a long way in making the electronic collection information interoperable and bodes well for future transitions and improvements.

This metadata language, then, no matter the schema, allows for certain accessibilities, for instance, through a browser interface. This point is important when we think about information retrieval. In some ways, the whole point, or perhaps I should say - the joy of, metadata is that it provides for wider access in a digital environment. What we choose to collect and embed in our metadata schemas (or our image metadata) directly impacts how well our objects can be found by users. I'm talking purely from an electronic standpoint now, forget the card catalogue of yore.

It's possible that every point could be an access point if one chose to make them so. On first glance one might say, "well, of course, every point should be an access point" but that's not always the best approach depending on the circumstances. Here is where librarians can really be a huge help, not only in designing metadata standards but in the retrieval of the information gleaned from them. As information professionals, we're perhaps in the best position to know how users tend to look for their information (some may challenge this assumption what with the uprising folksonomies and other Web 2.0 ideas but let's forego that idea for right now).

There's a few ways one might find information; a library's OPAC; a database of surrogate records (MARC records most likely). Perhaps the most widely known is OCLC's WorldCat union catalog that acts sort of as the do all and be all of catalogues because it can easily connect to records within it's 10,000 member library's catalogues. Mind you, it doesn't get you anything more than the location of the object or, in some instances now, you might find a purl (persistent URL) field in a record that will take you to a digital object (sometimes a book or audio/video stream), but if it's an analog object (such as an undigitized book), you're not going to actually get the book, only where it's located. Though it's difficult to gain access directly to the object you want, these record databases are vital to a library's inventory so, it's not like they're not necessary. They just don't do everything users demand today.

Another way to find information is through commercial databases. These contain an aggregation of journals, periodicals, articles, books, etc. Personally, I find these avenues very frustrating. If you can gain access to them (for instance, if you're a student at a university, let's say, you may need access to the proxy server if you choose to do research off campus or outside the library itself and, if you're not a student, you may not have access anywhere BUT the library) they're not easy to navigate and they may or may not offer full-text access. You may only find a simple citation.

This is a personal pet peeve of mine, and I don't think I'm alone on this. Here's how these things work; some vendor decides to provide digital imaging of, let's say a scientific journal for example, and they lump a bunch of these journals together. They go off and sell these packaged journals to libraries. The good thing for libraries is; less physical storage is necessary and, in some cases, they may even get a break in the deal if they, say, take X package of journals at a discount if they also buy Y package. It can be vastly cheaper on many fronts than if they buy the paper products. Sounds good, right? The problem, and this is where the peeve comes in, is that these vendors can suddenly decide to drop any of their journals even though you, the library, bought the rights to the electronic copies for X number of years. Not only does this cause a business dilemma but it causes a great ethical dilemma concerning your users. What if you're a library in a major medical university setting? Do you think a disease is going to just stop happening because your doctors and internists can't get the information they need? Of course not. So, it becomes, to me at least, a way for the commercial vendors to hold information hostage. But libraries have little choice - nobody has the resources to digitize the stuff themselves and space is a rare commodity for everyone it seems - the content is massive in scale - so they're actually better off sticking with the vendor and hoping the information remains available. In those instances where something is discontinued, usually the vendor supplies a copy of the digital files to the library - after all, they've paid for it. Then, lo and behold, the journals show up on CDs even though a library generally isn't equipped to store or manage OR have the internal interface to allow access to their users through a browser. It's a catch 22 and I look forward to some whiz kid coming up with a way to circumvent this hold vendors have over our information.

I might mention that both of the information retrieval methods mentioned above have some amount of keyword search capabilities, more so with the commercial databases, though both employ controlled vocabularies - especially OPAC records. But the aggregated databases don't generally offer keyword searches of the full-text documents - only the abstracts or citations. As such, a lot of relevant information may be left unidentified to the user. And, of course, with MARC records, full-text keyword searching does little good since the records are succinct, distilled information bits, well informed and carefully crafted bits that come at the price of labor of skilled workers i.e. $$$, but bits none the less.

The other method of information retrieval, and this is where metadata plays a huge and developing role, is the internet. The good things are: it's free to anyone who has access, it provides wide ranging content in a number of languages, the goods sit on millions of servers so it's loss isn't so volatile, and it has become very user friendly what with tag clouds, folksonomies, and the like. Users generate the information, users may be better able to find the information. Of course the problems are everything I just mentioned. With all that comes the fact that, because it's user generated, it may not always be reliable information and there's tons of redundancy in the information. How many times have you done a Google search for something and 10+ pages of hits have been returned to you? And a lot of those hits come from commercial sites; hardly the harbingers of reliable, unbiased information. Too, a search engine won't always be able to search the "deep web" so, there's a lot of hidden information that won't be returned to you.

This is not to say you can't find good stuff on The Web - you can - but the information illiterate among us will just take the first thing that comes to them and think it's the gospel truth - like the email circular claiming Obama is a Muslim - hello - my mother sent it to me so it must be true, right? My mother would never lie. Right. But what we face as information professionals is the speed and convenience the web offers to everyone. In may ways., it levels the playing field between the haves and the have nots. Information isn't locked in a dusty archive or ivory tower university anymore - and because it's not, because that playing field is there for each and every one with the where with all to go looking for it, means that our older, trusty avenues of information, like our OPACs and commercial databases, are lagging behind the expectations of our users. And, so, there ya go.

Stick a fork in me, I'm done.....

better than I ever could

I just came across the best statement on the library profession that I have ever read. It's from author Rachel Singer Gordon and it's called "If it quacks like a librarian..."

Since November myself and a fellow librarian have been mulling over the idea for an article about Library Schools and the role they play, or more specifically - what role they're not playing - as the rise of paraprofessionals come aboard in the work place. After reading "If it quacks like a librarian..." I don't think there's much we can add to the argument, really. Or, if nothing else, Gordon sums up what I personally think about the topic better than I ever will. To paraphrase a portion...

"It’s one thing to value the MLS. It’s another thing entirely to condescend to non-MLS librarians (yes, I said librarians), paraprofessionals, and other non-degreed library workers, to discount
their opinions, and to ignore their contributions to their libraries and to librarianship as a whole...

What is it they say about academia — that the politics are so fierce because the stakes are so low? All this talk about “erosion of professional standards” boils down to this: we’re terrified because the outside world doesn’t tend to value librarians, and we’re worried people will pounce on any excuse to fire us, lower our salaries, or otherwise devalue us further.

Well, guess what. The outside world doesn’t know — or care — that librarians have an MLS. They don’t care what LJ decides to print. They just care about the service they receive and whether someone can do the job she was hired to do...An MLS doesn’t automatically make someone a good librarian, just as the lack of an MLS doesn’t automatically make someone a bad — or non — librarian."

I encourage anyone interested in this very current, and very heated, topic, on either side of the argument, to read Gordon's piece in its entirety - it isn't long but, then again, it doesn't need to be. As with most LIS topics - leave out the BS and quality content could be had in two paragraphs or less. 'Nuff said...

Thursday, March 20, 2008

maundy thursday

Love one another

That's the message Jesus gave the disciples not long before he was crucified and one that is often lost this time of year (just about any time of year actually), either because of the bunnies and eggs or the need to only observe the resurrection and be downright bizarre about it (there's just no other word for it that I can think of). For the last two weeks I've been blessed with people who have been really kind to me for one reason or another. From the stranger at the grocery to my co-workers to my family and friends far, far away. Each one has brought something generous to my table without provocation. And in those words and deeds of kindness have been lessons that I needed and that I am profoundly grateful for. Love one another. I hope I'm able to return it to one and all.

Sunday, March 16, 2008

sunday sermon

I don't generally like sermons so, just ignore the subject line. but I thought of it because A. it's Sunday (I know, not very original) and B. practice essays are a lot like sermons - you're asked to talk about a particular subject and, so, in many ways the language and position are often black and white, cut and dry, there's little room for "I don't know" or "it could be this but it could also be this". Sermons don't often say "I don't know, it could be this or it could be that". Or maybe I'm just thinking of evangelical sermons....

...anyway....

Let's consider for a moment the major internet technological advances that have propelled libraries to the state we are in today. It mostly started in 1969 with ARPANET (Advanced Research Projects Agency...net). Derived from ideas mentioned as early as 1962, the United States Department of Defense lead the way in building a system that used packet switching instead of the previously used circuit switching. Circuit switching is most closely identified with telephones. A single call ties up a circuit which then remains unavailable for any other use until that call ends. ARPANET's idea of packet switching meant that data could be communicated in packets to a variety of machines at once thus enabling multiple communications simultaneously. The system grew slowly for many years, and other "nets" were also developing tools for better mass communications, though all were used mostly spearheaded by researchers for research and remained a disjointed jumble of independent networks. These "nets", like ARPANET, had long since developed email and file transfer protocols by the time "the internet" emerged at CERN around 1989/90 (maybe as late as 1991/92 depending who you believe).

Introduced, and developed in large part, by Tim Berners-Lee at CERN, the use of hypertext or http ("text on a computer that will lead a user to other, related information on demand") along with the internet protocol suite (TCP/IP) was essentially meant to connect all the analogous networks that had developed since the 1960's in a way that made each available to the other simultaneously. What this meant, then, was that normal communication lines, like telephone lines, could open a world of information sharing to anyone who had access to "the internet" or "the network" by way of the internet protocol via service provider and a computer and could deliver and receive information from any other terminal, or terminals, that used this same system. It's perhaps important to note that Berners-Lee purposely opened this technology to the world, ensuring that it's use could be had by those outside global research communities.

However, many believe the true start of "the web" began in 1993 with the introduction of the first graphical browser (GUI) interface MOSAIC. MOSAIC allowed images to be embedded along side text rather than served independent of the text in a separate window. This was a significant step in that, most "non-scientists" could "surf" the web in a way that seemed most like life, as if thumbing through a book or newspaper (though not nearly so linearly).

Once MOSAIC came aboard, the technological advances seemed to speed up at a rate that was, and still is, mind boggling. In fact, Mooer's Law from 1959 suggests that as technology develops it will double every 18 months and the price of computers/drives/storage/etc. will come down. Experience says this is true though still not always economical for the common wo/man (hence the digital divide, but that's for another day).

Once MOSAIC and it's successors were introduced to more and more people, their behavior with the technology developed. This new knowledge placed demands on developers of other internet technologies like Email that evolved into instant messaging or chat clients (like AIM and jabber), listservs, message boards, and "groups" like yahoo!Groups. These things enticed users to become more involved in online activities so that still more advances and tools were developed thus giving rise to Web 2.0 and technologies like RSS feeds, wikis, virtual spaces, mash-ups, podcasts and audio streaming, and more.

All of this has added up to be a major challenge for libraries in how we deliver services to users, the role of librarians, and how the library now identifies as "place". Reading rooms have turned into computer labs. The idea many users have now is that everything can be found on the internet, which, of course, isn't true, but that hardly matters if that's what patrons believe. It puts the burden on libraries to ensure good information is as available, if not more so, than all the misinformation that's out there. It has demanded that reference services find new ways of reaching constituents. For instance, reference may be place based but it doesn't have to be place bound - offering virtual reference services through web portals, usually through their host library's website. This also requires that reference librarians be especially adept with electronic resources as many databases and journals are now electronic products only and they're not always easy nor intuitive to navigate, even for the most hearty of veteran researchers (a small drawback to all this technology: the ever changing interface!). An idea that has been used in the past for virtual reference has been consortia virtual reference, texting, or "field hospital" reference services strategically placed around campus.

This new delivery system of information has also given rise to privacy issues and intellectual concerns. For instance, what do you do if you find a patron surfing child pornography sites? Some librarians are instructed to look the other way for fear of disturbing the very delicate private and/or civil rights of the patron. The U.S. Patriot Act allows the FBI and other government officials to confiscate patron documents if they so choose, again aligning the library and librarians against the very precarious patron privacy debate. And there is the idea of intellectual freedom which suggests that we each have a right to our own thoughts and ideas as well the research that goes into them and that no one, most especially libraries, should interfere with the research that may go into their derivation.

As mentioned earlier, one of the drawbacks to all this technology is what's known as the digital divide. Because an internet service provider and a computer that's internet ready are not the most economical of tools, financially speaking, it leaves a good portion of the population at a disadvantage for finding information. This is perhaps less so these days as the proliferation of the personal computer has exploded since the turn of the century. Never the less, there are those in less fortunate economic circumstances that stand to lose a great deal without such access. Libraries, some would suggest and rightfully so, are bound by our adopted ethics to serve each and every human regardless of race, gender, political or economic status. So it is with this in mind that a proactive approach to reaching these under-served patrons, especially in the public library realm, comes to play out. Again, offering services to entire users to the libraries is probably the most efficient and creative way to do it - coffee shops, free computer courses for children and adults, fun ways of waiving late fees like DDR games and the like, and, of course, good old fashioned print advertising in a way that makes people feel welcome. This is a huge topic in and of itself and how libraries can best serve the under-served in their communities.

And with that - I bid you adieu for now and prepare for a kick-ass SEC championship game. Go Hogs (Arkansas Razorbacks' coach is a KY Alum and a darn nice guy - how can you not want them to win? Oh, okay, Georgia did do a pretty historic job getting to the game - I love underdogs...and their coach was at Western KY for a good while - how could you not want them to win? Wait - didn't I ask that already?????)