My Photo

Oxford England

  • The End of the Road
    These photos are separated from my Travels album because Oxford is something of a second home. I still manage to visit it several times a year. So the pathway between Manotick and Oxford is well trodden and I can likely do it with my eyes closed - and probably have on more than one occasion.

Royal Roads University

  • Hatley Castle
    This series of photographs was taken over the last few years. I have stayed at the campus of Royal Roads on several occasions and I have been repeatedly impressed by the grounds. They are in many ways a little-known treasure.


  • Kafka Statue
    Here is a selection of pictures I have taken during my travels over the last few years. I am very obviously an amateur photographer and it is not uncommon for me to forget my camera altogether when packing. What the pictures do not convey is the fact that in these travels I have met, and gotten to know, a great many interesting people.

Manotick Ontario

  • Springtime in Manotick
    Manotick Ontario Canada is the part of Ottawa that I call home. Much of Manotick stands on an island in the Rideau River. Interestingly, the Rideau Canal, which runs through and around the river, was recently designated a World Heritage Site by the United Nations. So this means that the view from my backyard is in some way on a similar par with the Egyptian Pyramids - although the thought strikes me as ridiculous.
Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported

« The Trials and Tribulations of Content Management | Main | On the Management of Content »

August 17, 2009


Feed You can follow this conversation by subscribing to the comment feed for this post.

Milan Davidovic

Have you seen Davenport & Prusak's definitions of data, information, and knowledge (in 'Working Knowledge')? The don't specifically account for content; I would put it after data, information, and knowledge and say that efforts to *contain* any of these three gives rise to *content*. That definition, however, may not meet the needs of methodological rigour you have in mind.

Joe Gollner

Hi Milan and thank you for this timely comment. I knew that this post, despite the grandeur of its title, would not be the final word on the topic of "what is content". Indeed, it was immediately obvious that further thoughts on content were necessary. (In fact, I have since updated the post as a result of this exchange.)

Now that I am back in my library, I am able to consult "Working Knowledge" to refresh my memory on the specific definitions of data, information and knowledge that Davenport & Prusak put forward. Now that I have done so, I am reminded that this is one of the books about "knowledge management" that I have found to be the most level-headed.

Wisely, Davenport & Prusak declare that their intention, in framing some working definitions, was to be practical and useful in the context of discussing how organizations and people create and use knowledge. I say wisely because, as I perhaps illustrate all too graphically, venturing into deeper philosophical waters can be more trouble than its worth. That said, I find the definitions in "Working Knowledge" to be exceptionally good. While it may be difficult to discern at first, the definitions offered by Davenport & Prusak are not incompatible with those I have put forward, although mine perhaps wade more deeply into potential philosphical complications. Some examples might serve us well here.

Davenport & Prusak define data as follows:

"Data is a set of discrete, objective facts about events."

This can be one of the derivations that would be possible from my admittedly more abstract definition. I would probably pause over the use of the term "fact" and seek to bring a little more formality to how it is used, but this is a small quibble and one that would not, practically speaking, provide much help.

Information, in "Working Knowledge", becomes "data that makes a difference",..."it is a message" and "it must inform". This too, I would see as being a quite practical application of my more generalized definition.

Finally, with knowledge we hit what appear to be some differences, but these in fact turn out to be somewhat less important than the similarities. In "Working Knowledge" we find knowledge defined as follows:

"Knowledge is a fluid mix of framed experience, values, contextual information, and expert insight that provides a framework for evaluating and incorporating new experiences and information."

Excavating the similarities between this definition and my more austere generalization would take a little time. My whitepaper, ominously titled "The Anatomy of Knowledge", hopefully provides a background that, taken as a whole, illustrates that Davenport and Prusak's definition of knowledge is not incompatible with mine. The differences that do crop up turn on the perspective taken - whether you choose to view knowledge from the perspective of the person knowing or from the perspective of the thing known. When we consider the question from the perspective of the person knowing, the knowledge that this person already has must be acknowledged as playing a massive role in how new information will be interpreted or framed as utterances. In my whitepaper, I referred to this "form of knowledge" as "accepted knowledge", although perhaps "active knowledge" might be another way to put it.

And this all brings us to the question of "content" and how it really fits with these three concepts of data, information and knowledge. Seeing that these three concepts represent "conceptual artifacts", with semantic import that can pass from one person to another sometimes intact, sometimes intact but then to undergo change, and sometimes in a manner that sees it change in the exchange. I do believe that as soon as we start using the word "management" we are obligated to select definitions for the items being managed that make some sense. And so it is that content is usefully understood as the physical instantion of data, information and knowledge, how it is packaged and transacted, how it is contained, and the form in which it is meaningful to talk of its being managed.

Of course, when we talk about the content of a message, we are generally referring to its semantic import, so I am not 100% comfortable that we are out of the forest yet. At different times, I want to talk about the content as what is inside, while at other I want to talk about content strictly as physical artifacts, including their inter-relationships. There are even times when I am inclined to see the physical representation as what is "inside", and thus available to interpretation by recipients, and thereby rejecting the notion that there is anything more "inside" that is being carried along for the ride. But this now shows what can happen if you start into the more slippery slope of philosophic investigation.

So I think, or perhaps hope, that we are zeroing in on an understanding of content that will be useful, and as with the example provided by "Working Knowledge" usefulness in definitions should count for something.


Hi Joe,
I believe you have managed somewhat comprehensively to define those key terms. I found it a bit limited how “enterprise information management people” define or should I say categorize Content versus Structured Data. Suggesting that Content is unstructured information – plain text? – and data is structured, mainly fields in data bases of number of back end systems.

Ref. Simplifying Information Architecture, Creating An IA Program That Works by Alex Cullen,7211,37385,00.html

Which otherwise to me is a well thought paper. Layers in the frame work picture make sense at least.

Best Regards,
Heimo Hänninen
your Finnish collaborator.

Joe Gollner

Hello Heimo

I think that you have hit the proverbial nail on the head.

To many observers, content is simply what we have not yet taken the time and effort to properly structure. I encounter this viewpoint regularly - almost on a daily basis. To these people, content is further classified as either material that simply does not merit the investment associated with applying structural discipline or material that secretly wants to structured data and that has not been so elevated simply due to a lack of time or resources. The understanding of what constitutes structured data is invariably defined, for these people, by what can be managed using mainstream database technology.

Now for the content that has been classified unimportant, at least from a classical Information Technology (IT) perspective, the technology allocations that are made tend to be "broadbrush" measures such as more storage diskspace, perhaps a search tool, and maybe even a repository where these holdings can be dispatched (and perhaps hopefully forgotten). The attitude towards these resources often seems odd when you actually look at the materials being relegated to the infrastructure periphery as they certainly look important - policies, procedures, proposals, plans, and so on - very often with the signature or endorsement of someone who is relatively senior.

The second classification of content is in fact my personal favourite. This viewpoint is often associated with projects that set out to elevate the content, which has been designated as worthwhile, to the level of data that can be stored and managed along with other structured assets. These are my favourite because they are so frequently associated with projects that can only be described, even charitably, as disasters. In these projects, sometimes massive investments are made to construct database environments and business applications that can handle these newly reclaimed data resources. These investments become unstuck when the intrinsic complexity of content refuses to obey the often laughable restrictions that are associated with data-centric systems. Most entertaining of all is the fact that the proponents of these projects do not see that the problem lies in a fundamental category error - assuming that the content in these cases simply needs to be properly structured so as to become data. These proponents refuse to see the nature of their error and often make second and third assaults upon the problem only to meet with renewed frustration. Because the fundamental error is effectively invisible to these people, the sourse of these failures is always elsewhere and someone else, usually the user community, is at fault. The truth is that given the fundamental nature of the error being made, in attempting to see content as aspiring data, these people will never succeed no matter how much analysis they direct at the material, or how much money they spend building applications, or how many new product features they leverage in their database technologies.

Content, I am contending, straddles and encompasses the full range of communication levels - data, information and knowledge - and as a consequence exhibits complexity, and unpredictability, for which relational database technology is hopefulessly ill-suited or, more correctly, to which relational database technology must be applied in highly selective ways and with suitable limits placed on the implementers' ambitions for control and precision.

In a recent project, we pursued the question of where does the product data really come from? Where does, indeed, the product itself come from? It turned out that a significant proportion of the data held within the product lifecycle management system was in fact references to documentary sources, such as engineering standards, from which specific data items were drawn and from which these items took their authority. In this case, the products in question existed in a highly regulated industry and the data behind the product and its manufacture was subject to extensive control. The data, it turned out, derived its authority within this regulated industry by virtue of the fact that it first, and primarily, existed within document content. So the initial impulse of some to rescue the data from these ancient artifacts ran straight into a fundamental brick wall - the data had to exist in the context of a document before it could be legitimately used in a database and product modeling environment. The document came first and took precedence and my data-oriented colleagues on this project have been uneasy, even distraught, ever since.

John O'Gorman

Interesting discussion, Joe - thank you.

I have some reservations however, about making distinctions between data, information, knowledge and content; just as I have about so-called 'unstructured' and 'structured' digital assets.

The reasons for my hesitance can be rendered down to these three:

1. One application's data is another's information, and there are many other examples where the diferences between all forms seems to be arbitrary.
2. The 'structure' said to be associated with relational databases or hierarchies to name two prevalent examples, is more arbitrary than most people think. Extended to a logical (or illogical, depending on your point of view) extreme, it means that almost any structure can be applied to any collection of data or information making the primary reason for having them (meaningful communication) less than optimal.
3. The lack of structure said to be associated with other digital assets is not as problematic as the experts would have us think. Proof is in this communication, which by most standards seems 'unstructured', but by the contracts of effective communication: agreed upon grammar, semantics and context is highly evolved.

Data, information and knowledge are all phases or manifestations of the same thing - like ice and steam are all water. It doesn't seem effective to say that one is Hydrogen and the other is Oxygen and they may or may not evolve into water.

Joe Gollner

Hi John

Thanks for your note. It comes at an opportune time as I have been thinking more about how my rubric hangs together and specifically how content fits into the mix. I think that I will be returning to this topic shortly and, owing to a number of sources including some of my own past presentations, I suspect I will be looking at content from the perspective of how it might be associated with "narrative" communication. But this is for another post.

On the points that you raise, I think we might actually be in closer agreement than it might appear - although that fact may have been buried under my sea of words.

On the first point, and this will return in looking at your third point, I agree that separating data from information is not a especially practical or possible task. If information is a meaningful organization of data, as I posit, then the organizational structures embodying that meaning would invariably be data. It's a bit like what Yeats once asked - "who can tell the dancer from the dance?"

On the second point, I do see the arbitrariness of structuring schemes but also the utilitarian nature of that arbitrariness. Or perhaps I hope that there is a utilitarian force guiding the formation and application of structuring schemes, knowing of course that they are frequently carried, willy nilly, from one domain to another and applied (or forced) in ways that don't necessarily make sense. By using the loaded term "meaningful" in my definition of data (meaningful representation of experience), I am invoking this "intentionality" although I am not one to immediately, or unequivocally, assume that it is a conscious intentionality.

Although I feel compelled to take a slightly different tack on the meaning of data than he does, I am inclined to reference Max Boisot's Information Space and its treatment of data as that which any given agent can, and does, perceive as meaningful within the stream of experience it participates in.

On your third point, I must say that this is the point that comes closest to what I have been thinking about most recently. So thanks for that. In my working day, I am routinely in the position of saying something like "You have designed an entire system predicated on the availability of neatly-packaged, uniformly-structured, and universally-accessible data nuggets. I am sorry to be the one who has to break this to you but this is not what you have to work with. You have a serious case of content, a more fluid mix of data, text patterns, media assets all flowing in a structure of sorts that we can only call narrative." In each of the layers, in my model, I stress the term "meaningful" and each definition both invokes and bears upon communication. In fact, I center all discussions of data, information and knowledge on communication as opposed to biological or cognitive processes (which I find to be the more popular approach and one I am repeatedly bumping into). Along the lines of my response to your first point, I don't see how we could untangle data, information and knowledge in any one "transaction", or instance, and I am not sure that such an effort would yield us much.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.


Post a comment

Comments are moderated, and will not appear until the author has approved them.

Your Information

(Name is required. Email address will not be displayed with the comment.)