Alex Cabal and the Standard Ebooks "saga"
Alex Cabal is the founder of Scribophile, one of the web's largest online writing communities, and Standard Ebooks, an open source project that creates commercial-quality ebooks for free distribution.
julia ferraioli: Hi everyone, my name is julia ferraioli. It is a lovely gray day in Seattle today and I am here with Alex Cabal, who is going to be sharing the details, the background, the story behind Standard Ebooks. Alex, would you like to introduce yourself?
Alex Cabal: Yes. Hello. Good to be here, julia, thanks for inviting me. My name is Alex, and as you mentioned, I run Standard Ebooks which is a volunteer-oriented open source project that takes public domain books and that usually means books published before 1927 with some exceptions and we take those transcriptions and we turn them into really high quality ebooks that are kind of commercial quality or better and then… That’s actually my hobby. My actual real life job is running a writing community called Scribophile. I’ve been doing that for about 15 years now and we’re one of the largest writing communities online and that doesn’t have much to do with open source, but that’s kind of what I actually do in my real life job.
[chuckle]
julia ferraioli: Very cool. So are you a writer yourself?
Alex Cabal: No, not especially. I always tell people I’m part of the life support industry for writers. There’s a gigantic industry of people helping writers succeed and that’s kind of what I’m part of. But I don’t really write myself, no.
julia ferraioli: Got you, got you. So we’d like to start off with a bit of a fun question. Do you have favorite background music to listen to while you work?
Alex Cabal: I listen to a lot of salsa music, I enjoy that a lot. I have a big collection of ’70s and ’60s era of salsa that is always kind of a go-to for me. I can put it on in the background and kind of tune out. I listen to a lot of jazz. Yeah, those sorts of things. I have a hard time listening to music that has lyrics in it and so when I’m working especially… And so those are, jazz obviously doesn’t have lyrics. Salsa has lyrics, but they’re often repetitive, and so you can kind of tune them out, so that’s kind of why I think I like it.
julia ferraioli: I hear you, I have the same kind of considerations. There has to be lyrics that I know really, really well or none at all, or in a different language that I can’t understand.
Alex Cabal: Right, right. Well, I can speak Spanish, so that doesn’t help me there, but the repetitiveness is useful and there’s only so much salsa in that era of ’60s and ’70s. That era is over. And so after a while, we can recognize all the songs and you get used to them and so it’s not like a brand new thing anymore.
julia ferraioli: Right, exactly. I think a lot of people have that criteria where it needs to occupy the back of your head, but not the front of your head…
Alex Cabal: Exactly right.
julia ferraioli: …so we can still think.
Filling a personal need in the world of ebooks
julia ferraioli: Okay. I’d love to understand a little bit about how Standard Ebooks came to be.
Alex Cabal: Sure. So I started working on ebooks probably about 10-ish years ago, maybe a little longer. And the reason I did was because at the time I was living in Germany and I was living in a small town, it was a lovely town, but there was not really a way to get English language books there. I’d have to go to the nearest big town and even then, it was kind of a crap shoot, they didn’t have a great selection, if you’d find a lot of copies of Harry Potter and Girl With A Dragon Tattoo, those were the popular books in that day and I was like, “I’m done reading those.”
So I decided to work on ebooks, and I would go on Project Gutenberg, which I’m sure you and your listeners are familiar with, and I would download these books and I would be often disappointed in the quality of the…not the transcription, the transcription itself is usually pretty good, but the presentation of it, and they were not great to read on E Ink devices, which is what I had.
A lot of those books are very, very old. Project Gutenberg is 50 years old and so a lot of those books they produced were produced in an era when HTML wasn’t quite settled yet. Before Unicode existed even. People forget that 50 years ago, there was no Unicode and so the ebooks weren’t often of great quality. I thought to myself, “Why don’t I try to improve the situation just for my own self?” And so I would work on ebooks for myself just privately, to bring them up to a standard I thought was good.
After a while, I realized I was developing this kind of gigantic tool chain to make everything work and I could do a whole lot of interesting things like adding some nice typography, automating checks for typos, that kind of thing. Taking advantage of what the ebook format allows, which is… That’s a pretty powerful format.
Bringing in contributors and building a community
Alex Cabal: So I kinda formalized it for myself in that way, and then initially what I did was I collaborated with a friend of mine who was an artist to do an edition of Alice in Wonderland that he illustrated and at the time… Again, this was like 10 years ago. The hot new thing was “Pay What You Want” models. People were trying experimenting with that, that was kind of the thing that Radiohead did, to kick start a lot of stuff. And so we decided to do it, like I said, Pay What You Want ebook. And we just called that publisher Standard Ebooks. We thought that was kind of a cool name and he designed the logo.
That book didn’t really go anywhere, we’ve made like 100 bucks on it, it was more of an experiment than anything, but I liked the name, I liked the logo and then a couple years later, I thought to myself, “I’m doing all these ebooks for myself. Why don’t I put them online for people to enjoy? I can put the tool chain online, then maybe someone will end up using it.”
And so I put together the website, I re-used the name on the logo and it kinda took off, everyone really liked those books and people were really interested in contributing and maybe that was probably five years ago, five to seven years ago, I don’t really remember now. And well, I’ve been doing it ever since. And now it’s gone from being a hobby in my spare time to almost being a full-time job in itself because there’s so many people involved now.
We have a big community of volunteers from all over the world and I’m spending a ton of time just managing people and assigning tasks to people, reviewing people’s work, answering questions, people will have what seems like a real simple question about something and it takes like an hour to research, and my answer is maybe yes or a no, but it took like an hour to achieve yes or no, and so it takes up a lot of time but I enjoy doing it.
julia ferraioli: And this is on top of your full-time job?
Alex Cabal: Yeah.
julia ferraioli: So nights, weekends, I’m assuming?
Alex Cabal: Yeah. Right. Nights, weekends, I check in during the day fairly often. Nowadays, I have a team that I call our editors and they help a lot; they do a lot of the day-to-day work nowadays. I assign them books to manage and books to review and they kind of do it on their own, because they know how the system works. They have a good sense of typography and the flavor of the kinds of books we’re working on and so now I can trust them with a lot that I used to have to do myself, so they’re a huge, huge help.
Scaling Standard Ebooks and its contributors
julia ferraioli: So let’s talk a little bit about that. You said that there was a lot of interest from the community to both consume and help out. What was your first reaction to this influx of interest?
Alex Cabal: I guess I was pleasantly surprised. I didn’t expect it to go anywhere. Again, I was doing it for myself. But to see that people were really interested, it feels good, and it was a good validation of the idea. It seemed like there was a huge demand for high quality ebooks and people understood that Project Gutenberg could be improved upon. And so I was happy to do that.
julia ferraioli: Were there any challenges that you faced in scaling the volunteer base that you had?
Alex Cabal: I think the challenges I had were mostly challenges of my own time management. Once word of the project got along, it seemed to kind of grow on its own, and it’s not like I’m out there advertising it or really making a huge effort at all in that sense, so attracting people didn’t seem to be the problem. The problem was, I attracted so many people that I was running out of time to help them all and because not only do people have questions during the process, they have questions related to the technology, they have questions related to the book itself.
You have to keep on top of people to complete what they’re doing because a lot of people start a project and they’ll abandon it, so they need to be prodded along from time to time.
And so all that took a huge amount of time and it took a while for me to, I guess, find those people who were helping, that were helping consistently and whose work I could trust and that were doing high quality work and then finally being able to offload some of that work to them. And that was probably the biggest challenge and it was challenging too, in a way, because I enjoyed doing it, and so… Because you can get sucked into something like that, you just spend hours and hours doing something and you forget that you had an appointment somewhere else. So yeah, I would say that was kind of the big challenge in this project.
julia ferraioli: For anybody listening or reading, I’m grinning because a lot of these comments hit home for me, especially the missing appointments one. Hyper-focus is a real thing; it’s a challenge. In your volunteer base, I imagine you have a bunch of people with very differing skill sets. You’ve got the folks working on the tool chain and then also the editorial side?
Alex Cabal: Yeah, you’d be surprised. The people who tend to stick around are talented in all those areas. The tool chain is probably the fewest contributors. It’s mostly me and then a handful of other people who do the contributions to the toolchain, but as far as ebooks themselves go, just in order to get started, you need a pretty solid understanding of the command line of basic HTML that you perform, it isn’t too complicated, but you have to have some knowledge of HTML and stuff.
And then on top of that, you have to have a good eye for proofreading, you have to be able to read the book and catch typos and catch things like, “Oh, maybe there should have been a comment here, but I don’t see one.” And then you would have to go check the page scans to see if there is one. And so the people who… A lot of people will contribute one ebook and then that’s kind of it, and that’s fine, but the people who contribute more than one that often have all those skills pretty well lined up. So it’s interesting in that sense that there’s a lot of skills that are required, but the people who work with us have all of them, it’s not like we’re… It’s not like we’re handing off parts to other people to finish. It’s not like an assembly line.
Prioritizing satisfaction of contributors and historic integrity
julia ferraioli: So you’ve got one person responsible from start to finish?
Alex Cabal: Yeah, exactly. One person’s in charge of their book from start to finish and then we have one of our editors managing them like, so, this person ask questions and the questions go to the editor or not to me and then we assigned them a reviewer at the end who’s a different person, he’s gonna look over the book once they’re finished.
julia ferraioli: That makes sense. And that’s also probably a really great way to give people the full experience of what it is to contribute to Standard Ebooks ‘cause they get to start to finish.
Alex Cabal: Yeah. They get to do a start to finish. I always say it’s like building your own light saber or it’s a satisfying thing to be able to take an entire thing from beginning to completion and then you’re able to read and enjoy it as well because part of the project is having to read the book in order to catch any typos that might have gone past previous transcribers.
julia ferraioli: So when people catch typos – and this is just me being super curious – how do you tell if it’s a typo in a previous transcription or if it was a typo in the original book?
Alex Cabal: Yeah, nowadays, you can find all those page scans of those books online, places like Internet Archive, Google Books, HathiTrust all have huge libraries of paper books that they’ve scanned and you can almost always find the edition that was transcribed and compare it to the page scans so that’s basically how we do it. And what a lot of people don’t realize is that books and especially the very popular books like Jane Austen or Herman Melville or whatever have gone through many, many, many editions in the past several hundred years and each one is totally different because back in the day, there were no computers, people were setting type by hand in lead blocks and different editors would do their own thing to different editions.
They would change spelling and they would do all kinds of things and do their own punctuation and no one cared back then. It was not a concern or even thought about. And so it’s very common to get additions that are totally different and sometimes the transcription that we receive is a blend of editions and there’s not really any way to tell. So you just have to go in there and do your best. You have to… Part of the project is having a lot of critical thinking in the sense that sometimes the history of these editions is very complicated and you just have to do some research and do what you think is best, and sometimes there’s not necessarily a right answer and the answer you pick is gonna make some people mad, but that’s just how it is, you gotta do it.
julia ferraioli: Do you ever get people trying to insert their own, not commentary, but stances may be on grammatical concepts… I’m totally not thinking about the Oxford comma here.
[chuckle]
Alex Cabal: No. I hope not, at least, because I don’t read all those books. [chuckle] People sometimes ask like, “Oh, this sentence would sound better if the comma was here, but I don’t see one in the scans.” And I always tell them, “Look, that’s a mistake of the author’s and it’s not really our business to be changing that.” I don’t think grammar in particular is something that’s… We do change spelling because I don’t think that’s a big deal, but things like comma placements or there should have been a period, there should have been a semicolon, we should probably just leave that.
Because the other thing is that part of the project is to have this sort of consistent style across all of our ebooks and so if we’re gonna be doing something to one ebook, say for example, another thing that used to be common was using the colon instead of a semi-colon where you would see… Today you would be familiar with the semi-colon. Back a 100 years ago, they would use a regular colon and that’s unusual for us to read nowadays, and so that’s a common question we get, “Well, should I change it to a semicolon.” And the answer is, “No.” Because then we have to go back through all of our books and change them all and then who’s gonna sit and read all of them, sometimes the colon is actually correct. It’s not within the scope of human endeavor to do that.
julia ferraioli: That makes perfect sense.
The surprise of success
julia ferraioli:Has there been anything super surprising in your journey with Standard Ebooks?
Alex Cabal: I guess the success of it surprises me in a way. I didn’t think that so many people would be interested in these old books, but people really are and it’s not… I don’t even think it’s because they’re free… I think that’s part of it. I think the fact that they’re free is an attraction to some people, but I also think that people are genuinely interested in the literature of the time and that’s kind of surprising. I’ve always enjoyed it myself but, yeah, it’s always… It’s nice to see that. In my real life, if I’m not out there talking to my friends about a copy of A Voyage to Arcturus, that doesn’t happen in my real life and so when it doesn’t happen in your real life, you tend to think that it doesn’t happen anywhere. But it turns out that people do really like this stuff and that’s really cool to see.
julia ferraioli: Yeah, and if I may add some of my own opinion, Standard Ebooks also helps a lot with accessibility for some of these old books because in a lot of cases, what you do get without it is just images of pages, right?
Alex Cabal: You’re right, is that we make a point of doing that actually, so our style guide which you can find online defaults to that, not explicitly, we don’t go out there and say, this has to be accessible, but everything that we do in that guide is designed around accessibility because accessible books are not only accessible which is a good thing in their own right, but they turn out to be much simpler internally and much more well-structured and much easier to machine process, which is another important aspect of what we’re doing. We want all of our books to be consistent internally because that way it’s much easier to make changes across the entire corpus of books if we have to because everything is very consistent.
If you open a commercial ebook nowadays, you download something from Amazon and you open it up to see what’s inside, like the actual EPUB format, it’s gonna be… More often than that, it’s a complete disaster, it’s the kind of HTML that you would see generated by JavaScript or something on the web where you’re like, “It’s unreadable.” And so those kinds of books are not possible to update because everything’s totally different, so, yeah, we’re trying to do something where accessible books are simple on the inside, it’s easier to make them. And there’s a lot of benefits. So that’s what we do.
julia ferraioli: Very cool. In terms of open source, how do you feel that open source has really impacted the project in of itself?
Alex Cabal: Well, the entire project is open source. We use Python for our toolset and we pull in a lot of libraries, obviously Python is a “batteries included” language, so there’s a lot of libraries ready to use for us, so we do that. Obviously, the books themselves are free of intellectual property restrictions, so that’s sort of a form of open source in the text itself, the EPUB format is open source and builds entirely on open source technologies. It’s basically XHTML on the inside in a Zip file, that’s more or less all it is, and it includes various formats that are defined elsewhere, like a metadata format that’s all open source, and so… Yeah, the entire thing is open source from top to bottom. Our toolset is released through GPL-3 and the books themselves are already in the US public domain. However, our producers add things like metadata, they add descriptions. They add kind of their own work and so all of that is released via CC0 Public Domain Dedication. So yeah.
julia ferraioli: Very cool. Well, thank you. One last question.
Alex Cabal: Sure.
julia ferraioli: Do you have a favorite book?
Alex Cabal: Sure, the one I enjoyed the most was called A Voyage to Arcturus. I think I mentioned it earlier and I liked it a lot because it was very surprising to me, you hear that title and you kind of think like the 50s rocket ship with ray guns-like kind of an Edgar Rice Burroughs adventure, but it’s not that at all. It’s this really bizarre, philosophical book where this guy is transported to this alien land and each part of the land he goes through has a different philosophies, different way of life. He goes through the Nietzschean will to power land, and then he goes through the communist utopia land, and he goes to like all these different places and it’s a super interesting book. It became a cult classic and is still republished today, but I enjoyed that a lot. I enjoyed Howard’s End a lot. That became a big movie a couple of years ago, but I read the book and the book is very, very good. Very interesting.
I enjoyed The Magnificent Ambersons which won a Pulitzer, I think in the 20s and that’s a book about how modern life encroaches on people’s happiness and especially when it comes to cars and urban developments which is kind of like a personal bugbear of mine in real life is how Americans are always driving everywhere. And I also enjoyed The Forsyte Saga, which is really big, it’s actually three books in one, and that one is similar in theme to The Magnificent Ambersons. It’s about how the middle class developed in the 1850s-ish, and how their desire to acquire material wealth is something that is dangerous, it’s something that can sort of destroy a person’s soul. At the time it was written, the middle class was fairly new. For us, born in this age, we tend to forget that the middle class has not been around forever in the Western world. It developed in sometime around the 1850s and only really became bigger towards the end of that century.
People were very interested in writing about how that new developing social class was going to change things and how it was good or bad and that’s what this guy who wrote it saw. He saw that people were entering lives of working at banks and working as merchants and doing all these things where they felt unhappy and powerless and the only reason they were doing it was to buy a bigger house and to own things, to own and control things and that concept of private ownership and acquisition of material wealth was new to the society at the time and I think it was a really powerful book, and he actually won the Nobel Prize for having written it, which is unusual because most Nobel prizes are awarded for a body of work. It’s very unusual for a prize, in literature at least, to be awarded because he wrote one book, but it’s understood that he won the prize for that book, so yeah.
julia ferraioli: That’s fascinating. Well, I just got a bunch of book recommendations and I think I have already downloaded at least one of them.
Alex Cabal: Okay, great. [chuckle]
julia ferraioli: Well, I think that’s probably all we have time for, these sessions always seem too short to really wrap up
Alex Cabal: Yeah. It was very quick.
Go read more books
julia ferraioli: Any parting thoughts about ebooks about open source, about the project itself?
Alex Cabal: Well, I would say in general, I would say, go out and read more. I think nowadays, it’s very easy to get sucked into one’s phone. You’re scrolling Reddit, you’re scrolling Facebook, you’re scrolling Instagram and you suck up a lot of hours doing nothing and learning nothing and probably making you still feel much worse and I recommend to people to sit down and read more. If you really wanna listen to my advice, please read more old books because they have a lot of interesting insights that are very relevant today, just like the book I just talked about.
I think if people were to read The Forsyte Saga, they’d realize that much of what he’s talking about is their lives they’re leading today, but we’re so far removed from the inflection point in history that we don’t think about it anymore and being drawn back to say, “Hold on, people were, have had concerns about this thing or that thing some time ago when they expressed those concerns eloquently and they were thoughtful about it and what can I take from that in my own life,” I think is something that can really benefit people, so read more books.
julia ferraioli: I co-sign that.
[chuckle]
Read more books. Well, thank you so much, Alex, for joining us today, and I just… I really love the project, and I’m a typography nerd at this point, so…
Alex Cabal: Great, thank you. Well, thank you very much and thanks for inviting me.
The story was facilated by julia ferraioli