TEI at 20: Congratulations! The Next 20 Will Tell the Tale

(The text of a Keynote presentation given by B. Tommie Usdin of Mulberry Technologies, Inc., at TEI at 20: 20 Years of Supporting the Digital Humanities)

Let me start by introducing myself as a friend of the TEI, but not a member of the cabal. A user of the TEI, and an enabler of other users, but not to the exclusion of other XML vocabularies. In fact, although it may not be safe to say so in this company, while I have written TEI-based tag sets, I have also been involved in the development and promulgation of other widely used XML vocabularies, which in some circumstances have been seen as competing with the TEI for adoption. I am a friend and a competitor, an admirer and a critic.

So, by asking me here to talk, you have obeyed what I think of as the most important rule for development of standards: the Sleeping Beauty Rule, that is, “Invite the Wicked Fairy”. There are other rules: keep careful notes, make your decisions public, maintain a stable workforce, but none more important than “Invite the Wicked Fairy”. Hmm. Can you tell that I finished writing this talk on Halloween?

The TEI Is a Substantial Technical Achievement

P5. Wow. A whole new TEI — based on a schema language that had not been invented when you started. A whole new TEI — that helps users better encode what they care about in their documents.

I wish I could tell you a whole lot about how cool P5 is. About the new features and the internal tidying that will make use and maintenance easier. And about how much more cleanly you will be able to … do what you do. But I am not the person to do that; I am not on the team that is developing them. I haven’t even read the draft guidelines; I have just poked at them on the web site a bit. But I notice that there was a full day workshop on P5 yesterday, and there will be a session on it tomorrow afternoon which I will certainly attend.

I can confidently state that it is a substantial technical achievement. How do I know? Well, P4 was a substantial technical achievement, and P5 is standing on its shoulders, so to speak. And P3 was a … You don’t need me to go there, do you?

And I have been peeking at it from time to time as new bits were updated on the web.

How Did You Get Here?

From its beginning, the Text Encoding Initiative was a gutsy thing to even attempt. The scope was way too big; there were too many disciplines involved; and most of the people needed to make it work had minimal computing expertise. Further, most of them had no idea how big a task it was. I remember hearing about the TEI about 20 years ago at an SGML conference and thinking to myself: these people are nuts — they have no idea what they have just set out to do. It will be exciting if they can do it, but … I don’t think they have a chance. (Don’t let that worry you. I thought the same thing when I first heard about HTML. In fact, I remember throwing away the first email messages I received about HTML. I was in the midst of developing a tag set that, at that time, had over 1,000 elements and dozens of attributes, and they were going to tag EVERYTHING with 20 elements. Absurd and useless. Do I need to tell you that I was wrong? HTML is neither absurd nor useless. Well, it isn’t useless. Sometimes I still think it is absurd.)

The hubris of developing a markup vocabulary that was intended to support all academic texts (at least, that’s the short version as I understood it) was amazing. We, the publishing experts in the SGML world, all KNEW that a custom vocabulary, with custom infrastructure for creating, managing, manipulating, and especially formatting for display, was needed for each type of documents. A dictionary and novel were substantially different, and thinking that you could create one vocabulary that would work for both was, well, a novel idea. But you folks set out to do it, and do it you did.

So, What Did You Do Right?

The hard work involved a great many people, who saw a need and were willing and able to work together suspending what had to be a lot of skepticism to define and build a tag set that, at least in the beginning, had to look pretty thoroughly useless. After all, tools to create the tagged SGML were few and far between, and very expensive. And doing anything with those SGML files once you had them was seriously challenging, not to mention even more expensive.

You Wrote a Good Vocabulary

Vocabulary. Tag Set. DTD. Schema. Whatever you want to call it, it’s pretty good. Now don’t get all puffed up about that — lots of groups have written good vocabularies that have vanished into the cold dark backrooms of dead project archives. Writing a good vocabulary is necessary for success in this game, but far from sufficient.

What makes this vocabulary good? It is highly varied and does not seem to have been “leveled” by a bureaucratic notion of fairness. Many portions of it retain the flavor of the field for which they were developed and have been allowed to retain the shapes appropriate to that field while being integrated with the whole. For example, the structure for performance texts has a significantly different shape and feel from the section on verse, and it certainly doesn’t look like either was “leveled” to be parallel to the other. (I have seen vocabularies that covered the areas of many subject experts try to be “fair” to all involved and, in the process, do silly things like reduce the number of structures, and thus the detail available, for one area because another that is equally important did not have that many tags. Sort of an academic version of “Mommy, he’s got more jelly beans than I do!”)

You got subject experts to participate in modeling, and you listened to them. This means that even in situations where people can’t, or won’t use the TEI, they will consult it. I doubt if anyone will model a dictionary again without reading, understanding, and being influenced by your dictionary model, even if they don’t end up using it.

You Modularized It, Not Just for the Developer’s Convenience, but for the Convenience of the Users

The creation of a modularized vocabulary, in which users could select only those portions that were relevant to their project and customize those to add structures for things that were unique to the particular project was new, cool, and in a real sense revolutionary. If the TEI wasn’t the first such effort to do this, it was certainly the first to talk about it in the SGML community, and it was influential. What was, at least at the time, called “the Pizza Model” is now basic to the way shared public vocabularies are designed. The notion that there was a core portion of the vocabulary that everyone using the vocabulary would use and that there were other parts that you might or might not need, want, or employ, is still a powerful notion and still implemented with various levels of success in various places.

And speaking of the Pizza Model. One of the fine things about the early TEI was the many, varied, and sometimes absurd food metaphors. They seem to be gone or at least significantly reduced in visibility. I have to say that I miss them — partly because they were clear and partly because they were a welcome bit of whimsy. The Pizza Model makes it clear that some things are basic and some optional: crust is basic, pepperoni is optional. Discussing tortellini in soup makes it obvious that there can be structured things in an unstructured environment.


Another thing the TEI did well, and far better than others were doing at the time, was to document the vocabulary. Not just documenting the meaning and intended use of every element and attribute (although some groups still don’t do that adequately) but more difficult, and more important, documenting the principles behind the whole thing. And you have continued to do this, which is difficult, time-consuming, and enormously valuable.

Hard Work

Hard work — that’s the first of the two reasons the TEI has succeeded.

There are, in my opinion, two reasons you succeeded: first is the protection that the good Lord seems to extend to fools and fanatics, and second is that your timing was right. Or, to put it another way: you worked hard, there was a real vacuum (there was no public tag set that addressed the needs of scholars), and you were very, very lucky.

The Other Reason the TEI Succeeded: Timing

When the TEI was started there were a few groups writing vocabularies for sharing. But, compared to the TEI, they all had very limited scopes. Well, I guess that really isn’t quite true; all the documents of the Department of Defense is not really a limited scope, but CALS, at least as it was envisioned 20 years ago, was certainly not a general purpose tag set. And there was the AAP tag set — designed to facilitate the publication of books and journals, but not their archiving and certainly not the analysis of their contents.

Using the infrastructure of the day, 20 years ago the TEI could not have succeeded. Remember — well, some of you are old enough to remember, and the rest can imagine — this was before the Web. It was before HTML. It was not only before XSLT, it was before DSSSL. And, perhaps most important, it was before Unicode. What does this mean? It means that in 1987 we did a lot of hand-tagging. And we spent a lot of time writing character maps and applications to process them so that we could use “special” characters (like the characters most of the people on the planet use every day) in our documents. And the investment in custom software to typeset from our SGML (and what else were we going to do with it?) was substantial.

Why? Well, that was before HTML. You started working on the TEI before HTML. That means before there was a cheap, convenient way for everyone to make medium-quality displays of virtually anything (which is what I think of is HTML — a fast, cheap low-to-medium quality display mechanism which is pretty handy!). And I would guess that most — perhaps all — TEI texts are displayed in HTML at one point or another in their life-cycle even if HTML isn’t their ultimate display format.

And that was before XSL made all the lies we were telling about SGML and XML into truths. XSLT paired with HTML made it practical for us to display our marked-up files quickly, easily, flexibly. And XSL-FO made it practical for us to make easily read printed versions of our texts, even in the situations in which those versions are not what we really want to publish. And even when we are doing high-end publishing of our marked-up texts, XSLT is used heavily throughout the process so our publishers actually realize the advantages we say they will see from the use of XML. (Without it, XML actually adds time, effort, and thus cost, to the publishing process.)

You started working on the TEI before the XML effort achieved one of it’s most ambitions goals: to make it easy to write SGML tools. That’s not the way the committee put it. As I recall, the goal was to make it possible for “the average computer science graduate student” to write a parser in a weekend. For those of you who don’t know, writing an SGML parser was a task “the average computer science graduate student” said would take a weekend, and after the weekend she (no, “the average computer science graduate student” was always a he in those days) after the weekend he said it would take more like a week. And after that week it was clear that another month would be needed, and it ended up taking 3 to 4 man-years to write a complete SGML parser. “The average computer science graduate student” can’t actually write an XML parser in a weekend, but she sure can do it in a week. Do we actually write many XML parsers? No, but this is indicative of the level of effort to write XML tools in general, and thus of the reason there are so many good, affordable, and, in some cases, free XML tools.

The TEI was there, ready, when the tools came along to make it possible to use XML in relatively low budget environments. (Few TEI projects have Department of Defense budgets.)

The TEI Is a Substantial Social Achievement

The TEI is the result of 20 years of group activity. A lot of very smart people have put a lot of time, energy, and passion into the TEI. And, frankly, you are not a group of people best described as “plays well with others”. So, the fact that you still talk to each other, that you show up at meetings like this one, and that there seems to be ever increasing energy in the TEI Consortium is a remarkable accomplishment.

Community Support

Perhaps the most important aspect of the TEI is that it is a mechanism to share expertise and tools in what most of the users consider a secondary, or perhaps tertiary, subject matter, and yet they think it is important enough that they are willing and able to put time, energy, and money into it. What do I mean by secondary or tertiary? I mean that their primary interest is in the subject matter of their documents; they are interested in art or literature or history or linguistics or … some other subject matter. And in many cases their secondary interest is in the study of that subject matter. And only after that comes an interest in how to encode, store, search, and display their material. The infrastructure that the TEI is a key part of is just that: infrastructure.

And yet, we have a room full of people, who would, on most days, be far happier discussing their primary topics, here to talk about the TEI: a shared infrastructure framework. And why? Well, there are some of us who are fascinated by this sort of infrastructure. And some who love the challenges and rewards that come with marked-up texts. But we are in the minority. Well, I know we are in the minority of people on the planet, and of people who use XML, and of people who use the TEI. I think we are even in the minority in this room.

I think it is likely that we have a room full of people who are here to talk about the TEI because it is a tool, or perhaps a meta-tool, that helps them spend most of their energy focusing on what they really want to think about: their documents and the meaning of those documents.

Your Assumption that People Will Extend the Vocabulary

The TEI Guidelines have included, from the beginning, the assumption that users will want to, need to, extend the vocabulary. (I have already mentioned the mechanism to do this as a technical achievement, but I think it is, even more than a technical achievement, a social achievement.) You have not assumed that you can make a list of everything anyone might ever be interested in, and you have not (officially) said that if there isn’t already a structure for it, you don’t need it. (I have seen a hint of that attitude on the TEI List from time to time, and I consider it a really unfortunate attitude. But it isn’t the official attitude. Fortunately.)

This attitude that subject experts, and particular projects, can and should incorporate markup of the things that they know about and care about into the existing structures was one of the key factors in the success of the TEI. (I know; it scares some people, and it certainly looks from the web site like there has been a backlash against it. I’ll get to that in a moment.) By allowing, even encouraging, people to add to the vocabulary for their needs, the TEI has countered many of the reasons that in other fields users look at public vocabularies and decide not to adopt them. If, for example, a vocabulary states (as one in current use does):

All XXXX standards and documentation are copyright materials, made available free of charge for general use. If you use the XXXX DTD, you will be deemed to have accepted these terms and conditions:

1. You agree that you will not add to, delete from, amend, or copy for use outside of the XXXX DTD, any part of the DTD except for strictly internal use in your own organisation.

2. You agree that if you wish to add to, amend, or make extracts of the DTD for any purpose that is not strictly internal to your own organisation, you will in the first instance notify “the-standards-maintainer” and allow “the-standards-maintainer” to review and comment on your proposed use, in the interest of securing an orderly development of the DTD for the benefit of other users.

If you do not accept these terms, you must not use XXXX DTD.

Now, a user who finds that this public vocabulary meets most of their needs, but not all, is far more likely to write something entirely new than to deal with this public vocabulary, no matter how much time, money, and effort using it might save them.

I understand the impulse behind that restrictive license, and I see indications that some of you are of the same mindset. You want plug and play to work with all TEI documents. You want to know that if a project says their documents are TEI documents and you have an infrastructure for storing, searching, and rendering TEI documents that their documents will work in your system. And you want to know that if someone has developed a cool tool to work with their TEI documents that it will work with your TEI documents. And there are a few of you who just want everyone else to DO THINGS THE WAY THEY SHOULD BE DONE! I have heard more than a few presentations, especially by librarians trying to deal with digital materials, bemoaning the unchecked proliferation of tag sets. And I sympathize.

By allowing, no encouraging, users to extend the TEI tag set to meet their needs you have made it easier for them to use MOST of it as published. That is, you have the best of both worlds: you are discouraging “the unchecked proliferation of tag sets” while letting users capture the things they care about. Well done!

Making Digital Resource Management Respectable

The fact that the TEI exists — the fact that there is a consensus-based academic standard for digital texts, with resources to support significant scholarly work — has contributed far more than its actual use in the fields of digital librarianship and digital archives. I believe that the TEI has made electronic text centers and similar resources imaginable and respectable. People who want to collect, manage, or curate digital texts are no longer seen as trendy nut-cases; they are seen as people who have taken on a difficult, possibly impossible, but very important task. The TEI didn’t make that shift in perception happen, but it certainly did help.

20 Years Old

The TEI is 20 years old this year.

At 20, a dog is quite old. Twenty-year-old dogs tend to be … well … on their last legs. In fact, few dogs live to be 20 years old.

At 20, a person might moving from the receiving side to the delivering side of the educational system. Becoming a parent as well as a child. (Do we ever stop being children? I don’t think so, but adding the role of parent nonetheless changes us dramatically.) At 20, a person is leaving one set of life roles and entering another.

At 20, a house is just settling in for the long run. The initial problems with its construction have been found and fixed, the foundation has settled and should be stable for a while, and while the “newness” has worn off, nothing fundamental has worn out yet.

So, is the TEI more like a dog, a college student, or a house? Some of each, I think.

The TEI is like a dog in that at 20 some of the fun things in its youth are, and should be, gone. There is less whimsy in the Guidelines than there was 15 years ago. There is less excitement. The TEI isn’t, metaphorically, chasing cars anymore.

The TEI is like a house. There is a solid foundation, a comfortable place to be, and people who have developed habits living in it. It has a few bumps and scratches, but they have been accumulated so slowly that they are hard for the people living there to see them, and they don’t matter much.

But mostly, I think, the TEI is like a college student. At 20, it is time to think about a career. Some seem to become perpetual students, but most find something more substantial to do with their lives. And at 20, I think the TEI needs to make the same sorts of decisions most 20-year-old people need to make. What do you want to do for the next 20 years? (Many of my friends have 20-year-old children — a few of my friends ARE 20-year-old students — and I always object when some ancient codger (my age) asks them what they want to do with the rest of their lives. That’s absurd. They don’t know; they can’t know; and they shouldn’t know. We live our lives by distraction — we set out to do something, we get distracted by something else, and all of a sudden we realize that this distraction IS what we now do. And we stay on that career path until we find that we have gotten distracted, and … our paths spiral. And that’s OK.)

The Next 20 Years

Anyway, this is a good time for the TEI Community to think about what you want to do with your next 20 years. Do you put comfortable chairs under the shade trees, or do you clip those trees into cartoon shapes?

What will the next 20 years bring for the TEI? What will the environment in which the TEI is used be in 20 years? How will document-based projects change in the next 20 years? How will the computing environment change in the next 20 years? How will the funding models, culture, and infrastructure of academe change in the next 20 years? I don’t know. You don’t know. We can’t know. But you can, nonetheless, work to shape at least a small part of those 20 years.

There are several questions you might want to ask yourselves, the answers to which may significantly shape the future of the TEI.

Do you want to grow the TEI user community? (I’ll assume the answer to that question is “yes”; it seems to be the natural inclination of shared efforts to want to increase participation.)

So, do you want to grow from within or recruit new users? This is not a trivial question, and the answer is not obvious. You have a substantial and varied group of current users, and many of the people working on those projects will move to other projects at other institutions and meet and work with new people. The TEI community could grow very nicely by growing as it is carried by your current users. This growing by contact and secondarily by word of mouth (these current users will talk with their colleagues and friends about what they are doing) could fuel steady but not explosive growth. And the growth will be relatively calm; people will generally have bought in to the TEI culture before they expect to have much influence on it.

On the other hand, you could try to reach out to totally new user communities, people who haven’t even heard of the TEI. This would, no, this MIGHT, result in faster growth and the addition of people with dramatically different points of view and values joining the TEI community. It would be far more chaotic (and perhaps far more exciting) than growing from within.

There are, in my opinion, two equally important things you need to think about as you develop any tool or technology that you hope to see widely adopted: entry and use. It looks to me like you have thought about, and worked on, both. But from different points of view and at different levels.

If you want to grow the TEI to new users, you will increase your chances of success if you re-think the Guidelines with ENTRY, not USE, as a fundamental design goal. Yes, I am implying that your current development has priviledged USE and treated ENTRY as at best a secondary design goal.

It looks to me like the fundamental design decisions behind the TEI Guidelines, not just the current ones but previous versions as well, were motivated by use. That is, you decided how the Guidelines would be structured and expressed based on what users of them would need and want. And this seems like a good thing.

And then you added a layer on top to help people get started. The concerns of entry were treated as superficial, not fundamental. At least, that’s what it looks like to me. And that may or may not be appropriate. It depends on how you want to grow.

When I wrote that, I expected to see some disagreement on some of your faces … and I do. I see why. You have a widely quoted “Gentle Introduction”, which is both charming and useful. And you have a lot of classes: not just “How To’s”, but also train-the-trainer classes. And these are all good. Very good. But. But. But …

Let’s imagine a potential new user, who will, of course, look at the TEI with fresh eyes.

Seeing with Fresh Eyes

It’s an easy thing to say: “Look at it with fresh eyes, and you will see anew”. But it isn’t such an easy thing to do. To look at something you are familiar with, comfortable with, and see past the history to the current state of things.

Few of us can simply decide to look with fresh eyes; those who can have a capacity for truth that I really admire — or an inability to form habits that must make life very difficult.

House guests give me fresh eyes, at least for a little while.

I have flashes of seeing with fresh eyes. I cannot exactly cause them to happen, although I can set up the circumstances under which it is likely to occur.

One of the common ways is to invite someone who has never been there to my house. Ideally, if the shades are to fall from my eyes, this is someone I have known for a while, and like and respect — in other words, someone whose opinion I value.

In getting ready for the visit of my first-time house guest I usually do some tidying — I don’t care how good a friend someone is, they shouldn’t see my house in it’s natural state the first time they’re there. And during that tidying process I often get a short look at the physical house with fresh eyes. How long has it been, I wonder, since I painted the interior of the house. How often does the dog shake next to the wall in the kitchen, and is it possible to really get all the mud off that baseboard? Other people haven’t filled the space under their coffee tables with books; perhaps some of those should go somewhere else. How did the drapes in the front window where the dog peers out get so grubby — again. Didn’t I wash those recently? How long ago was it — OH — perhaps it was a few years. There’s a story behind each of the little objet d’art on the shelf near the door, but to anyone who doesn’t know those stories that just looks like clutter. Perhaps it is clutter. The bushes in front of the house are badly over-grown and impinging on the walkway to the front door.

At this stage, all I’m seeing is things that could be, should be, improved. Things that I have been living with for so long that I don’t see them. Things that got to their current disreputable state very gradually, and since I see them every day I hadn’t noticed the drapes darkening a bit where the dog pushes them aside or the azaleas creeping up on walkway. It happened very slowly. Worse, I had put each of the objects on the shelf near the door, and there was a reason each of them was put there: It is beautiful, or was a gift from a favorite child, or might be convenient to have close to hand some day. Or perhaps there was no other place to put it, so it went there temporarily — six years ago. And all that stuff is SO FAMILIAR, or I know it’s history SO WELL, that I usually just don’t see it. But in the hour or so that I am tidying for a first visitor I notice. I usually do one cleaning item that hasn’t been done in years (he’s tall enough to see the top of my pantry — I’ll dust up there for the first time I can remember). With luck I will also calm down enough to may a list of the other things I have seen that really need attention and that I usually don’t notice. Because in just a little while my eyes will cloud with habit, and I won’t see that the bowl the collection of dice is stored in has a chip on the rim and I really need another way to store/display them.

These flashes of insight are useful, but not actually a lot of fun. All I have been seeing is the dusty, cluttered, unkempt parts of the house. A lot more fun is watching a first visitor look at my house. Now I see them looking at the pictures on the walls, and then perhaps noticing the rugs and the hand-made chairs. They always respond to the dog — some are happy to meet her, and some are intimidated by an 80-pound-black poodle. But as they meet her, I see that she really does resemble a sheep and that she really is very tall and very black. And I see that some people like that, and some … well … don’t. And I notice that I really like the rugs on the walls and that the flowers in the corner really help light it up.

Find a Way to Look at the TEI with Fresh Eyes

I know, I know. This is NOT the best time to be talking about cleaning your house; you just got it painted, and the carpet is so new you can still smell it. Everything is shiny and new, and you still don’t know how you’re going to get comfortable with P5. And here I am talking about making MORE changes.

Actually, this is a good time to do that, because for a little while even experienced TEI users will be forced into beginner, or at least learner, mode. Think about the things that confuse you. Make notes on the things that weren’t obvious and on all of the surprises you encounter. If anything confuses you, imagine how a total newcomer will find it.

I know quite a few of you, and I lurk on the TEI List, scanning subject lines and occasionally reading a thread. And one of the things I noticed is that many of you want more and more people to adopt the TEI. I think most of you who want that have really good reasons for this; you think that these potential users would be better off if they used TEI than if they use some other scheme for managing their documents. And, it looks to me as if there is also a little competitive impulse: we want to win!

You are all comfortable with the TEI tag set. And you know how it got to be the way it is, you know how each scratch and chip got there, and you know that they were the results of serious thinking, discussion, and compromise. Further, you are really proud of the new release. P5 is REALLY COOL, and you’re very happy with it. And you should be. For a little while.

And then, when the champagne is gone, the party is over, and the glasses have been washed, it’s time to think about what happens next.

What sorts of things do I think you should look for? The TEI is the result of an astonishing, and admirable, committee process. Do you suppose anyone actually knows how many committees, committee meetings, committee members, and committee decisions have led to the invention, development, and refinement of the TEI? I certainly don’t. But I have worked in a committee environment, and I know what sorts of scars committee work leaves behind.

Every time a group finds that there are two ways to do something and each of them is supported, vigorously, by a respected member of the committee, what do you do? Sometimes you agree on one of the options. Sometimes you compromise and do something that sort of satisfies both sides. Sometimes you do the worst possible thing and allow both options. And sometimes you do what one person wants, with the understanding that the looser gets to win the next argument (which may be “fair”, but it makes for crummy design). Committee processes leave scratches in the furniture.

Was I serious when I said that the worst possible thing to decide when you have two candidate ways to do something is to allow both? You bet! Why? Because no matter how carefully you document that they mean the same thing, users will imagine meaningful differences; they will see meaning where there is none.

“Growing” the TEI Community

One of my least favorite business buzz-words is “grow”, as in “We need to grow the user base” or “What is your plan to grow your business?” And I have never heard anyone in the TEI community actually say that they wanted to “Grow the TEI User Base” or “Grow the TEI Community”. But I have heard several of you talk about wanting more people to use the TEI and sulk about projects that chose not to go TEI.

So, let’s talk about what it would take to “Grow the TEI User Base”. The most important thing is to get initial adoption. Once a project has made the initial investment, they are less likely to decide to take another path. But there are a lot of reasons that they may not get that far.

The New User

Let’s look at that new user I mentioned a minute ago. Let’s imagine, say, an historian with a fondness for micro-economics. And our historian has just found, and been given access to, the log books, correspondence, tax records, and labels from a dairy on the side of a mountain that has been in continuous operation for the last 250 years.

This is the perfect situation for a new TEI application: lots of important documents, a lot of excitement, the expectation that these documents will be analyzed by many people for a long time. Obviously, her project team will be joining the TEI Consortium tomorrow; sending people to XML, XSLT, and TEI classes next year; and joining committees to influence P6 in two or three years. Right?

Well, perhaps. But perhaps not.

First, they need to have heard of the TEI. As important as you are, there are many, many people who have interesting document collections and yet have not heard of the TEI.

What if our historian goes to a popular search engine (say, Google) to find advice on how to handle these documents? If the search is for “manual documents web” or even “multiuse document creation” or “scholarly document creation” or “scholarly document management”, she won’t find anything TEI-related. “Scholarly document encoding” will get you to the TEI very quickly, but will someone who doesn’t know think to use the word “encoding”? “Coding” won’t get her there, nor will “xml historical documents”, although why our historian would know enough to be looking for XML I don’t know.

So, the first place you lose potential users is that they have never heard of you and can’t find you.

But let’s imagine that our project leader has a colleague who is a happy TEI user. Or she has talked to someone in her library or media-center or funding agency who suggested that she look into the TEI. So, she’s interested, or at least looking.

What’s next? She goes to the TEI web site. Put on your fresh eyes and go look at the TEI web site. Where do you find a description of what the TEI is? And who might use it, and what it is useful for? Scanning the TEI home page, I see:

Okay, that’s not the page. Suppose I am really lucky and click on “Guidelines” next. (If I were really unlucky, I might click on “About”, which would take me to more administrative information.) So, I click on “Guidelines” and am offered several choices: P5 Guidelines, P4 guidelines, Customization, and Licensing and Citation. All of these are important, and all of them are meaningful and useful to people who already know something about the TEI, what it is, and what it is useful for.

I know, I know. I am quickly earning the enmity of a lot of people who have worked very, very hard on the redesign of the web site. I am attacking your new baby. Please, settle down for a moment. I LIKE your new baby. It is a clean, clear, well-organized reference for users of the TEI.

What it isn’t, however, is a sales tool. It doesn’t address the questions of a total novice or of the person who needs to decide, quickly, if this TEI THING is related to his current needs.

This is a very vulnerable moment. Our project leader is going to be very intolerant of any situation that makes her feel stupid, or worse foolish. Will she have those feelings in trying to dive into the TEI? Perhaps. But if she does, she’ll drop the inquiry and find another way to handle those precious documents.

I suggest that if you look at the TEI web site with fresh eyes you will see that you designed it for yourselves. Or, to put it another way, it was designed to support current users and people familiar with TEI jargon and concepts. (What is a newcomer to make of P4 and P5, for example?)

Features for Beginners and Evaluators

Have you ever stopped and looked at a piece of software you were using and wondered why all those silly features were there? Not only does it do all, or at least many, of the things users need it to do, it also has a whole lot of junk, mostly menu-driven and colorful, often slow, and simply useless to the real users of the application.

And that stuff never gets removed from the software, no matter how many users despise it. You know why? It serves several purposes:

  1. Total beginners want to do these things or do them in these ways. For example, there are applications/tools where experienced users ALWAYS use the keyboard shortcuts, but there are menu options for everything. And beginners use them until they learn the tool. These easy-on-ramps or crutches are distained by the experts, but tools that have them have far more users than tools that insist that beginners bite the bullet and learn the “right way to use them”.
  2. They give something to look at. We have all seen software that has multi-color moving or complex displays that are practically, or completely, useless. They may be graphs of things nobody cares about, often with silly 3-D graphics. Or they have information boxes that pop up and cover the text you want to read. What is this for? So there is something a salesperson can demonstrate and something colorful to show on the screen in a trade-show booth.
  3. There are also a lot of “check-list” features built into successful commercial software packages. What’s a check-list feature? Things that may or may not be relevant to the real uses of the package, but that are needed for comparison check-lists. If you are considering three competing packages and one of them has checkmarks next to 35 features, another has 30 checkmarks and a much lower price, and the third has only 18 of the features you have heard that people might want in such software, what are you likely to decide? Does it matter that you got that check-list initially from a user of the high-end tool, and the list came from the vendor of that tool? Possibly not.

So, do I recommend that you trash up the TEI with a lot of color charts and useless features? Well, no. But I do recommend that you look at it with the point of view of a new user and a technology chooser/manager. What is the TEI equivalent of those check-lists? What do potential new users want? Need? I suspect that it is the ability to encode types of content. I KNOW that if you want to grow you need to think about this.

Let’s Get Back to Our Historian

She knows about the TEI. She didn’t run screaming (alright, she didn’t slink away embarrassed by her own ignorance) when faced with the web site. What now? How does she get started? She needs to learn about: XML, and probably XSLT; document tagging; and the details of the TEI. She needs to know enough about her documents and what users will want to do with those documents to be able to customize the TEI enough so that the information key to her project can be encoded. And before she can even get started, she probably needs to make some pretty good guesses about how much this will cost and how long it will take so she can write some grant proposals — because money is hard to come by and XML projects are not cheap!

Along the way, before she is committed, she will face several additional barriers to entry. She may read the Conformance chapter in the Guidelines. I recently read the Conformance chapter in the Guidelines. That’s what got me thinking about the difference between a specification written for existing members of the community and one written for possible future members. I am certain that there is a reason for every clause in the Conformance chapter.

Let me tell you a story. Some friends of mine and I found ourselves, a few years ago, eating in a little diner in a very small town in the mountains of West Virginia. And there, in a little wire stand, on the middle of each table, was a laminated card. It was not a list of the desserts offered or of the beers on tap. It was an except from the West Virginia hunting laws. That card ended up being our entertainment for the evening. Why? Because we were of the opinion that it wouldn’t be illegal if someone hadn’t done it. So the West Virginia hunting laws were documentation of some of the foolish, and evil, things hunters had done. You may be interested in hearing that in West Virginia it is illegal to:

It seems to me that the TEI’s Conformance chapter is, like the West Virginia hunting laws, a record of the foolish, and in some cases, evil, things TEI users may have done in the past. But it also a substantial barrier to entry. Not that it says sensible users can’t do what they want to do, but because it is … well … complex. Confusing. Overwhelming to newcomers who are simply looking for information on how to do TEI. But it is a place new might-be users are likely to end up because “conformance” sounds like something they need to know, and because they are unlikely to have stumbled into a better place to start.

Barriers to Entry

The easiest decision for the manager of any project to make when presented with a potential distraction (and, for many projects, the TEI will seem like a potential distraction) is to push it away. “We need to concentrate on our core”. Possibly the best option, too. After all, they DO need to concentrate on their core. So, if you, the TEI, don’t want to be pushed away as a distraction, you need not only to reduce barriers to entry, but also to actively invite use. (Whining that people or projects or groups SHOULD have used the TEI but didn’t is not actively inviting use. It is unattractive and unhelpful. Learning why people you think could, and should, have used TEI didn’t, and working on ameliorating that barrier to entry may be more effective.)

You need to suck them in. Get them started. Once a project has funding for encoding their documents, has a repository for their TEI documents, has a collection of important documents that are encoded with the TEI tag set, and has at least one way to look at those documents, they are hooked! Now they are part of your community and likely to be here for life. And possibly for the life of their next project and the project after that. But until then, every time it would be easier, or less confusing, or less risky, or less anxious-making to back out and send the documents to some third-world conversion house to be tagged in HTML or typed into a proprietary word processor, you can lose them. And odds are, they won’t be back.

What Am I Telling You? Market!

Did I come here to rain on your parade? Well, actually, no. I am trying to tell you that you have something fantastic, and now it is time to start marketing it. And to build marketing it into the heart of your product. Oooops. Some of you didn’t like my calling it a product.

Okay, let’s take a step backward. The TEI Guidelines ARE a product. Don’t sneer. Stop shuddering. You have something that you like, respect, think is useful, and want other people to know about, like, respect, and find useful. That sounds like a product to me. (Actually, the Text Encoding Initiative consortium is a product, too. And you do a pretty good job of explaining what that product is and why people should be interested. It’s what I think of as the “main product”, the Guidelines, that needs a marketing focus.)

Don’t wrinkle up your noses and make jokes about used car salesmen. I didn’t say selling; I said marketing. Do you know the difference? Marketing is what you do to reach and persuade prospects. Sales is what you do to get a signed agreement or contract.

Marketing is making your product appealing and approachable, and explaining what it is and who would, and who would not, be interested in using it. Sales is getting people to buy it. I distrust sales people and the sales process. (That may be why I run a SMALL business.) But I highly value marketing and the marketing perspective, because it is based on figuring out whose needs you want to address and then figuring out how to address those needs.

So, think about who you want to be the future users of the Guidelines. Think about how they will learn about the Guidelines and how they will learn the skills they need to use them. Try to look at the existing resources with fresh eyes — ideally the eyes of one of the users you would like to attract. Remember that the people you most want to join your community are likely to be busy, impatient, and intolerant of anything that confuses them or makes them feel foolish. Now, think about what can be changed to make these people comfortable, to inform and educate them, and to bring them into the fold. Every time you see your would-be user hit a barrier to entry think about how to remove that barrier. Don’t assume that you can change the future user; change yourselves, your information products, your presentation.

Now, get started on designing the next 20 years of the TEI!