Wednesday, March 26, 2008

a thought on the digital divide in info literacy

just a note on something I read this week concerning the digital divide.

A study from Tufts University suggests that the digital divide is getting larger because educated middle and upper-middle class parents and their children spend more time online and, therefore, are more informationally literate than their economically depressed counterparts. I saw a few comments like, "You left off 'Moron'" and "...might as well say the poor are stupid, after all, they're not smart enough to read your report." I can't say I disagree but, at the same time, it stands to reason that if you spend more time online you might get better at what you're looking for and be able to discern legit info from bogus.

Someone else said something like, "you have to want to learn no matter how much you practice" and that's true to a point. For instance, let's say you spend hours upon hours online but it's all at Target or iTunes or, dare I say, some two-bit porn site. I can see where one would be limiting their knowledge increase. But there are those people out there who surf for the sake of finding out new things, learning exciting stuff they didn't know about - those people are absolutely capable of putting their practice to good use.

I have a friend who devotes several hours on Saturday mornings to surfing - she looks at all kinds of stuff; from cars to bird houses to medical symptoms and their treatments and back to cars again. Over the years as I've watched her grow into this routine, I've seen her become much more savvy about sites and their information. She's not particularly well off financially (though she is not in poverty) and, though she has been in a professional position for her entire adult life, she has only a high school degree. But she loves to learn!

So, a lot of the results of being information literate must surely be based on the individual - and that goes for the digital divide as well. Maybe some backwoods library somewhere still doesn't offer internet access to their patrons but, most do, even in limited hours. Anybody of any economic background can most certainly get the access if they want it. So, it makes me wonder if part of the digital divide dilemma isn't up to us librarians to fix. Maybe we've been thinking about it all wrong; maybe it's like any other "service" you have to advertise. Would you buy a new model car from a maker you've never heard of? Or ask your doctor about some over-priced medication if you didn't see it advertised on tv? Or take your kids to the county fair if you didn't know it was in town? Would you take your kids to a gaming event at the public library if somebody told you about it and and showed other people there having fun that looked like you? Maybe....

From my own experience I can say that I never found a library, any library, particularly inviting. Every image I've ever seen (until very, very recently) has shown only well-to-do, particularly of the Caucasian persuasion, quiet, book-entranced kids. If you're smart and you like to read but hanging out with stuffy white kids ain't your thing, then why would you want to go hang out someplace like that? I wouldn't and I didn't. And I can't think of a single instance when a library system or a librarian ever tried to change my mind until I got to college. It's only been recently as a younger generation of librarian has come into the workforce that new ways of reaching people are really happening - like gaming for instance. Using that approach then reaching the public in a way they respond to i.e. advertising (billboards, TV and radio spots, print ads, etc.), maybe the next kid that likes to learn but hates being stuffy and homogeneous won't feel like an outsider like I did.

Now I'm just sounding sad and pathetic. Enough!

Monday, March 24, 2008

more information retrieval thoughts

if one were asked to compare certain aspects, like full-text vs. surrogate record searches, or information retrieval through different methods such as MARC records (surrogate), the internet, or aggregated databases, what would you talk about?

There are pros and cons for each method. How accurate you want your answer might determine which avenue you choose and both have differing rates of precision (optimized retrieval of right/relevant information) and recall (optimized retrieval) of their results.

Let's take full text searching for instance. Aggregated databases and most web based texts offer "full-text" searching. That sounds great but there are some drawbacks. With databases you're usually only able to search the citation or abstract - leaving out the biggest portion of the article. You may miss the very articles that are most relevant to your needs because synonyms and other helpful associated terms weren't included in the abstract/citation. The same can be said for web searches - it's a jumble of information in a far less organized environment and, so, many relevant pieces just aren't found (Google's kick-ass page ranking algorithm can increase your chances of success but it's still not a controlled vocab). Too, this kind of "free-form" searching, as I sometimes think of it, can, and does, easily produce a good number of false hits.

This last point, however, isn't always so bad. How often have you done a Google search (or another search engine of your choice) and come across information that may or may not be directly related to what you were looking for but proved to be important and interesting none the less? Some of this success has to do with "natural language" and the way web content is indexed, both of which we'll get to in a bit

Then there is the surrogate records, I always think of them as MARC records, which I access a lot in my current position. This individual item inventory has a list of fields that will tell you, in a controlled manner, a host of relevant facts like title, publisher, preceding or successive titles, publication date, and so on. Any one of these fields can be both an access point and a source of retrieval. The problem is that you'll only get the location of the item, not the item itself (except, perhaps, there is a PURL included - a persistent url that will take you directly to a digitized object). That's become a point of serious disappointment in the Google age - people not only want to find the information they're looking for but they want it now. They don't generally want to schlep to a library to look at a book or article.

Far more than most web content, surrogate records are derived by a controlled vocabulary.
Unlike natural language which let's users build the bulk of web information, a controlled vocabulary takes skill and time to generate. You can add synonyms, homonyms, and polysemes that enhance the aboutness of the item, thus increasing the precision of a hit. But this comes with significant drawbacks, in that, it's expensive to maintain and, because it is labor intensive, it may very quickly lag behind current and new information.

Natural language indexing doesn't suffer this problem. It's automatically indexed by the computer and, so, has an immediate result of the information. Granted, it may not be right or relevant - it may be so far off the mark you walk away scratching your head going "I asked for checks for my bank account, not checks in cotton material." A computer can not effectively process natural language. It doesn't know the difference between being serious and being sarcastic (though the idea of the semantic web aims to make computers intelligent enough to read and know the difference between this and all other differing language quirks).

Jesus - this subject is just huge when you think about it, trying to connect the index to the controlled vocabulary to the natural language vocab to the internet verses commercial databases verses a surrogate record and all the stuff that goes into each one. For everything you say, there's 2x as much you'll leave out. poo - I hate that kind of stuff