Born digital
數(shù)碼人生
National libraries start to preserve the web, but cannot save everything
各國(guó)的圖書館開始保存網(wǎng)站,但不能面面俱到
Oct 21st 2010
The library of the future
IN THE digital realm, things seem always to happen the wrong way round. Whereas Google has hurried to scan books into its digital catalogue, a group of national libraries has begun saving what the online giant leaves behind. For although search engines such as Google index the web, they do not archive it. Many websites just disappear when their owner runs out of money or interest. Adam Farquhar, in charge of digital projects for the British Library, points out that the world has in some ways a better record of the beginning of the 20th century than of the beginning of the 21st.
在數(shù)碼界,出乎意料的事情屢見不鮮。當(dāng)谷歌忙于將書籍掃描成數(shù)字文檔時(shí),各國(guó)圖書館已經(jīng)開始保存這個(gè)在線巨人漏掉的資料。盡管谷歌這類搜索引擎提供網(wǎng)站的索引服務(wù),但并沒有把網(wǎng)站保存起來(lái)。很多網(wǎng)站因其所有人缺乏資金或失去興趣而曇花一現(xiàn)。Adam Farquhar負(fù)責(zé)大英博物館的數(shù)字項(xiàng)目,他指出在某種程度上全世界對(duì)20世紀(jì)初期的記錄比21世紀(jì)初期的記錄要好。
In 1996 Brewster Kahle, a computer scientist and internet entrepreneur, founded the Internet Archive, a non-profit organisation dedicated to preserving websites. He also began gently harassing national libraries to worry about preserving the web. They started to pay attention when several elections produced interesting material that never touched paper.
在1996年,一個(gè)名為Brewster Kahle的計(jì)算機(jī)科學(xué)家和因特網(wǎng)企業(yè)家,成立了英特網(wǎng)檔案室,這是一個(gè)非營(yíng)利性的組織,致力于網(wǎng)站的保存。他也開始委婉地敦促各國(guó)的圖書館關(guān)注保存網(wǎng)站的問(wèn)題。幾輪選舉中出現(xiàn)很多有意義材料,但從未以書面形式保留下來(lái)時(shí),此時(shí),各國(guó)圖書館開始關(guān)注這一問(wèn)題。
In 2003 eleven national libraries and the Internet Archive launched a project to preserve “born-digital” information: the kind that has never existed as anything but digitally. Called the International Internet Preservation Consortium (IIPC), it now includes 39 large institutional libraries. But the task is impossible. One reason is the sheer amount of data on the web. The groups have already collected several petabytes of data (a petabyte can hold roughly 10 trillion copies of this article).
2003年,11家各國(guó)圖書館和因特網(wǎng)檔案室啟動(dòng)一個(gè)保護(hù)數(shù)碼信息的項(xiàng)目:此類信息沒有以數(shù)碼之外的任何其他形式存在過(guò)。這個(gè)稱為“國(guó)際因特網(wǎng)保護(hù)聯(lián)合體”的項(xiàng)目,現(xiàn)有39家大型機(jī)構(gòu)圖書館。但是這一任務(wù)幾乎無(wú)法完成。理由之一是網(wǎng)絡(luò)上的數(shù)據(jù)量極其龐大。這些團(tuán)體已經(jīng)收集了幾拍(petabytes)的數(shù)據(jù)(一拍大約能裝下10萬(wàn)億篇本文)。
Another issue is ensuring that the data is stored in a format that makes it available in centuries to come. Ancient manuscripts are still readable. But much digital media from the past is readable only on a handful of fragile and antique machines, if at all. The IIPC has set a single format, making it more likely that future historians will be able to find a machine to read the data. But a single solution cannot capture all content. Web publishers increasingly serve up content-rich pages based on complex data sets. Audio and video programmes based on proprietary formats such as Windows Media Player are another challenge. What happens if Microsoft is bankrupt and forgotten in 2210?
另一個(gè)問(wèn)題是如何確?,F(xiàn)在儲(chǔ)存數(shù)據(jù)的格式,在幾個(gè)世紀(jì)之后依然存在。古代的一些書稿人們到今天還能讀。但是很多過(guò)去的數(shù)字媒體,即使勉強(qiáng)能讀,也僅限于為數(shù)不多的幾臺(tái)老掉牙的機(jī)器。國(guó)際因特網(wǎng)保護(hù)聯(lián)合體已經(jīng)單獨(dú)創(chuàng)立一種格式,讓未來(lái)的歷史學(xué)家更有可能找到讀取這些數(shù)據(jù)的機(jī)器。但一種解決方案不能抓取所有內(nèi)容。網(wǎng)站發(fā)布工具越來(lái)越多地按照數(shù)據(jù)的復(fù)雜程度提供內(nèi)容豐富的網(wǎng)頁(yè)。以各類專有格式(如windows media player)儲(chǔ)存的音頻和視頻內(nèi)容也是個(gè)大問(wèn)題。萬(wàn)一2210年微軟破產(chǎn)或無(wú)人知曉怎么辦?
The biggest problem, for now, is money. The British Library estimates that it costs half as much to store a digital document as it does a physical one. But there are a lot more digital ones. America’s Library of Congress enjoys a specific mandate, and budget, to save the web. The British Library is still seeking one.
現(xiàn)在面臨的最大問(wèn)題是錢。大英博物館估計(jì)儲(chǔ)存數(shù)字文件的花費(fèi)是儲(chǔ)存物理文件的一半。但是數(shù)字內(nèi)容要多很多。美國(guó)國(guó)會(huì)圖書館很幸運(yùn),因?yàn)閲?guó)家有具體的保護(hù)網(wǎng)站的命令和預(yù)算。大英博物館還在爭(zhēng)取這種命令。
So national libraries have decided to split the task. Each has taken responsibility for the digital works in its national top-level domain (web-address suffixes such as “.uk” or “.fr”). In countries with larger domains, such as Britain and America, curators cannot hope to save everything. They are concentrating on material of national interest, such as elections, news sites and citizen journalism or innovative uses of the web.
因此,各國(guó)的圖書館決定共同完成這個(gè)任務(wù)。它們分別承擔(dān)其頂級(jí)域名內(nèi)的數(shù)字作品(如后綴為“.uk”或“.fr”的網(wǎng)址)。如果某些國(guó)家域名龐大,如英國(guó)和美國(guó),館長(zhǎng)們不要指望把什么都保存下來(lái)。他們需要重點(diǎn)處理關(guān)系國(guó)計(jì)民生的材料,如選舉、新聞網(wǎng)站和公民的報(bào)章雜志或創(chuàng)意使用網(wǎng)站的方法。
The daily death of countless websites has brought a new sense of urgency—and forced libraries to adapt culturally as well. Past practice was to tag every new document as it arrived. Now precision must be sacrificed to scale and speed. The task started before standards, goals or budgets are set. And they may yet change. Just like many websites, libraries will be stuck in what is known as “permanent beta”.
每天,都有無(wú)數(shù)網(wǎng)站消失,更讓人感到時(shí)間緊迫——圖書館也不得不根據(jù)社會(huì)文化需要,靈活處理。過(guò)去習(xí)慣做法是凡是新文件,一一標(biāo)注保存?,F(xiàn)在則必須犧牲準(zhǔn)確性換取規(guī)模和速度。標(biāo)準(zhǔn)、目標(biāo)或預(yù)算都沒確定,就開始工作了。而標(biāo)準(zhǔn)、任務(wù)、預(yù)算還可能變更。同很多網(wǎng)站一樣,圖書館也將陷入所謂“永久試用”的境地。
譯者注:
本文不長(zhǎng),語(yǔ)言相對(duì)簡(jiǎn)單,但要譯出“韻味”,還是不容易。我覺得有幾點(diǎn)需要大家尤其注意:
1. 大意的把握。兩方面內(nèi)容,保存什么?保存的是網(wǎng)站內(nèi)容,不是某個(gè)網(wǎng)頁(yè)。誰(shuí)來(lái)保存?各國(guó)的圖書館,這不是一個(gè)國(guó)家的圖書館能做得了的工作。這兩點(diǎn)弄不好,文中web、national libraries就不好翻。
某些具體語(yǔ)句的把握。除了要翻出意思,還要翻出感情色彩。就拿開頭一句“things seem always to happen the wrong way round”來(lái)說(shuō),有人翻成“倒行逆施”,這就說(shuō)明譯者沒有理解中文的倒行逆施有“貶義”,用來(lái)形容某個(gè)不得人心、逆歷史潮流而動(dòng)的做法等,感情色彩很濃厚。而本文則沒有這種強(qiáng)烈的情緒,因此,我認(rèn)為“出乎意料的事情屢見不鮮”、“事情的發(fā)展總是讓人摸不透”等比較貼切,看似“不準(zhǔn)確”,其實(shí)意思出來(lái)了。
聯(lián)系客服