     The term “Majestic documents”一般指数千页据称属于机密的政府文件,这些文件证明存在着由科学家和军事人员组成的最高机密小组—Majestic 12—formed in 1947 under President Harry Truman, and charged with investigating crashed extraterrestrial spacecraft and their occupants. 雄伟12 personnel allegedly included a number 的 noteworthy political, scientific, and military figures, including: Rear Admiral Roscoe Hillenkoetter, the first 中央情报局 Director; Dr. Vannevar Bush, wartime chair 的 the Office 的 Scientific Research; James Forrestal, Secretary 的 the Navy and first Secretary 的 Defense; General Nathan Twining, head 的 Air Materiel Command at Wright-Patterson Air Force Base and later Chairman 的 Joint Chiefs 的 Staff; and Dr. 唐ald Menzel, an astronomer at Harvard University. More specifically, the 雄伟的文件 refer to a series allegedly classified documents leaked from 1981 to the present day by unidentified sources concerning 雄伟12 and the United States government’对智能外星人及其技术的了解。1 这些文件的日期为1942年至1999年。

Due to the explosive nature 的 their content, the 雄伟的文件 are considered by many to be the core evidence for a genuine extraterrestrial reality and alien visitation 的 planet Earth in the 20th century. United States government personnel have denied their authenticity, primarily on an opinion rendered by AFOSI, the U.S. Air Force counterintelligence 的fice. The AFOSI report focused on certain features 的 the documents it considered historically anachronous and other historical inconsistencies (see Section 1.2 below). The charges 的 the AFOSI have been coherently rebutted, and so both validation and debunking efforts has resulted in a stalemate.

This impasse notwithstanding, other documents discovered before and after the alleged leaking 的 the 雄伟的文件 appear to validate the existence 的 the group Majestic-12. In 1985, a document referring to a joint National Security Council (NSC) MJ-12“特殊研究项目”该组织是由Jaime Shandera在国家档案馆中发现的。2 这份文件是罗伯特·卡特勒(Robert Cutler)致内森·特温(Nathan Twining)将军的1954年备忘录,被UFO研究人员称为“卡特勒-孪生”备忘录。卡特勒-孪生备忘录与1981年在特温将军之间发现的1953年卡特勒和孪生之间的备忘录具有某些文体特征’s papers at the Library 的 Congress. Canadian documents discovered in 1978, three years before the first alleged leak 的 the first 雄伟的文件, note the existence 的 a highly-classified 飞碟 study group operating within the Pentagon's U.S. Research and Development Board, and headed by Dr. Vannevar Bush. Although the name 的 the group is not given, these Canadian documents appear to support the existence 的 雄伟12. While this may be the case, proof for the existence 的 雄伟12 does not logically translate into authentication for the Majestic Documents themselves or their content on other points.


The 雄伟的文件 have undergone thorough forensic authentication with respect to non-linguistic issues and methods.3 The primary researchers who have put considerable effort into authenticating the documents are 斯坦顿·弗里德曼4 以及Robert和Ryan Wood博士的父子团队。5 这些研究人员通过以下方式对文档进行了测试:6










Wood团队能够在认证工作中寻求专家的专业知识。为了比较打字机的印象和水印,詹姆斯·布莱克(James Black)是他们的主要专家。布莱克先生是美国法医学科学院“可疑文件”部门的成员,并且是美国“测试与材料学会”“可疑文件”小组委员会的前主席。7 为了检查纸张,墨水和水印,伍德团队寻求Speckin法医实验室的服务。 Speckin网站指出实验室是:
。 。 。 [A] n国际法证公司,专门与原告和辩护律师进行咨询,涉及以下问题:伪造,条目排序,变更,加法,改写,墨水约会和纸张,打字,传真,影印,指纹,指纹,麻醉和街头毒品分析,分析和法医学,DNA,枪支和工具标记检查,鞋履和轮胎痕迹,手写,犯罪现场重建犯罪法医学和计算机取证。8
在鉴证鉴证程序和这些努力的发布过程中,已经引起了许多关注,例如明显的过时陈述,可能的打字机印象不一致,语法错误,偏离标准样式,印刷缺陷以及在不同文档上几乎相同的签名。 Wood团队已对所有这些问题的示例进行了分类并给出了答案。9

To date such criticisms 的 the Majestic Documents have failed to deliver conclusive evidence 的 forgery. However, 斯坦顿·弗里德曼 has successfully detected several fakes among the cache. The forgeries were photocopies 的 authentic documents with certain content and vocabulary changes designed to alter the content toward a discussion 的 雄伟12. These forgeries are explained and illustrated on Friedman’s website.10 The presence 的 these forgeries do raise the spectre that all the 雄伟的文件 may be contrived, especially since an estimated seventy percent 的 the documents are photocopies. However, it is important to note that no other fakes have been conclusively detected.

Notwithstanding the examinations noted above, the 雄伟的文件 have never been subjected to scientific 语言学 analysis to determine the validity 的 their authorship. While the Wood team and Mr. Friedman mention in several 的 the cited publications and websites that the 雄伟的文件 have also undergone “linguistic”测试,相同的出版物和在线资源均未提供此类测试的证据。伍德团队和弗里德曼先生未能通过以下术语来定义它们的含义“linguistic testing” or “linguistic analysis,” and 的fer no proof that genuine forensic 语言学 analysis 的 the type conducted for this paper ever took place as part 的 their authentication efforts. Additionally, while the Speckin Forensic Laboratories website mentions that the company does work in “computer forensics” (see above), the Woods 的fer no evidence in their writings or website that Speckin ever tested the 雄伟的文件 in this way.

Only 斯坦顿·弗里德曼 makes any attempt to describe an effort to have the 雄伟的文件 tested 语言学ally and, as his description makes clear, no modern forensic computational 语言学 work was actually done:
在律师鲍勃·布莱奇曼(Bob Bletchman)的建议下,我从杜鲁门图书馆(Truman Library)获得了希伦科特(Hillenkoetter)各种作品的27个例子。韦斯科特博士审查了这些内容和EBD(艾森豪威尔简介文件),并在1988年4月7日给鲍勃的信中指出。 。 。‘在我看来,没有任何令人信服的理由将这些通讯中的任何内容视为欺诈,也没有理由相信这些通讯是由Hillenkoetter以外的任何人撰写的。该声明适用于1952年11月18日备受争议的总统简报备忘录以及正式和私人信件。’11
The above account contains no information on what Dr. Wescott (now deceased) did with the documents given to him. Several considerations suggest that Dr. Wescott likely did little more than look at the documents, rather than conducting actual tests. First, the development 的 the field 的 computational 语言学s and the use 的 computers for natural language processing 的 necessity followed the development 的 computers and processing power. In 1988 these research methods were known, but not widely available. Second, Dr. Wescott’s areas 的 expertise included neither authorship attribution research or computer forensic 语言学s. Rather, the focus 的 Dr. Wescott’s work was anthropological 语言学s.12 Despite his distinguished academic year, a search 的 语言学s databases produces no evidence that Dr. Wescott ever did any work in these areas. This is no doubt because his teaching career ended at roughly the time these fields were beginning to blossom.


还应该指出的是,韦斯科特博士’s assessment lacks conviction. At best his amateur opinion in this sub-discipline 的 语言学s 的fers the conclusion that he has no basis to draw an actual conclusion. As 飞碟 researcher Paul Kimball points out, Wescott himself made it clear that he had given no conclusive answer or endorsement to authenticity. In a letter to the 国际UFO记者,韦斯科特写道:“我没有坚定的信念支持在这个问题上任其两极化。 。 。我写道,我认为它的[EBD]欺诈行为没有得到证实。 。 。我同样可以坚称其真实性未经证实。 。 。在我看来,不确定性是其本质。 ”13

This is all that is 的fered in terms 的 语言学 testing and evidence for the 雄伟的文件. The thoroughness and care with which Friedman and the Woods have addressed other forensic issues is sorely lacking with respect to modern methods 的 语言学 analysis, specifically designed to determine (or rule out the possibility) 的 authorship 的 documents. The absence 的 demonstrable testing data in any form 的 publication puts the burden 的 proof on these and other researchers to prove they have indeed subjected the 雄伟的文件 to 语言学 analysis.


This study fills the existing research void created by the absence 的 strictly 语言学 approaches to the problem 的 authenticating the 雄伟的文件. The goal 的 the research presented in this study was to determine whether the 雄伟的文件 that carry a signature were indeed written by the people to whom authorship is attributed. Toward achieving this goal, the study employed state-of-the-art computational 语言学 methods 的 authorship attribution. In some cases, these techniques have been pioneered by Dr. Carol 查斯基,a recognized leader in this type 的 语言学 research.14 These methods have been employed, validated, and approved numerous times in various courts 的 law. It is the opinion 的 the authors that the utilization 的 these methods is the most reliable and testable means 的 authenticating or refuting the authorship attribution 的 those 雄伟的文件 that bear the name 的 an author.


The remainder 的 this paper details the application 的 computational 语言学 methods to determine the authenticity 的 authorship attributions 的 the 雄伟的文件. The paper is divided into the following sections:
• Description 的 the 雄伟的文件 included and excluded in the study.

• Overview 的 the 语言学 testing methods used in the study.



• Suggestions for future 语言学 research 的 the 雄伟的文件



The 雄伟的文件 tested were obtained online via www.majesticdocuments.com, the website repository for the 雄伟的文件 maintained by Dr. Robert Wood and his son Ryan Wood. The Woods have had the 雄伟的文件 posted free to the public for several years as part 的 their efforts to expose the public to this material.


为了进行作者身份归因测试,所讨论的文档必须已经归属于某些作者。 As such, only those documents among the 雄伟的文件 that specifically bear the name 的 a signatory author were considered for testing. Famous 雄伟的文件 such as the Eisenhower Briefing, for example, were not tested because there is no claim in the briefing as to the author 的 the briefing. Researchers and amateurs refer to the Eisenhower Briefing as though its authorship by Dwight D. Eisenhower was self-evident. The document itself makes clear that Eisenhower was not the author, as the very first page informs the reader that the briefing was “对于当选总统艾森豪威尔准备。”SOM1-01手册是《外星人与技术,回收和处置》的另一本著名的,雄伟的文件,没有注明作者姓名,因此被排除在测试之外。此外,爱因斯坦-奥本海默(Einstein-Oppenheimer)文件代表作者身份重叠,因此无法测试。


第三个标准是务实的,部分是出于成本考虑。在带有签名且长度足够(超过一两个句子)的那些文件中,优先选择进行测试的文件,这些文件特别提及存在外星生物实体(EBE)或存在外星生物起源的声明。打捞的残骸。测试中包括任何对验证外星假设(ETH)都很重要的文件,作为对UFO的解释。例如,提及罗斯菲尔残骸的检索或运输或其他与不明飞行物问题有关的事件的文档可能已被推迟进行测试,以确保文档中没有特别指出ETH或EBE的内容。仅提及“Roswell” or “Wright Patterson”不足以强制进行测试。简而言之,该文档必须具备一定的吸引力才能进行性能测试。

Fourth, some 的 the 雄伟的文件 could not be tested because they contained no prose text. An example is the document entitled, “宏伟的十二个项目,目的和目录(1952年夏天?)。” This document is simply a table 的 contents. Even if a document 的 this nature had an attributed author, it could not be tested by 语言学 means.

最后,没有选择本质上是二手的文档进行测试。一个例子就是冗长的鲍文手稿。伍德小组将此文档标记为“high interest,”它不是由一个人写的“in the know”关于高级别安全性,必须成为ETH和EBE证据或Majestic 12中讨论的主要见证者。正如伍德小组指出的那样,“鲍文个人与许多高层人士有联系, ”15 它不连贯地争论,一方面认为Majestic-12及其活动如此秘密,以至于其存在的证据仅在1980年代才可用,另一方面,它暗示Majestic-12的成员正在共享国家’像鲍文先生这样的局外人最机密的秘密。 Wood团队承认Bowen手稿的次要性质,因为他们注意到它的地位“飞行碟公共历史的写得很好的快照
从1947年到1954年。”16 该评论中的执行词是“public,”从内容上揭示了它的外围重要性。


The 雄伟的文件 tested by Dr. Chaski were typed and proofed by Dr. Michael S. Heiser, Amy C. Ward, and Joe E. (“Free”) Ward, 的 罗斯威尔, NM. Only the prose content 的 the documents was typed out for testing, along with salutations and benedictions. Date formulas, stamps, handwritten annotations, military file numbers, memoranda headings, etc. were not typed out since authorship attribution testing concerns the testing 的 written prose content for author-particular stylistics. Misspellings and ungrammatical errors in usage were preserved in the prose content reproduced for testing. Documents were saved as text (.txt) files.


The following spreadsheet chart (Chart 1) contains the seventeen documents allegedly written by nine authors that were tested by Dr. Chaski. Unknown to Dr. 查斯基,I included several documents previously demonstrated as fraudulent by 斯坦顿·弗里德曼 (see Section 2.7). I did so to test Dr. Chaski’的独立分析。这些欺诈性文件的身份在下面的测试结果中显示。



Thirty documents whose composition by the nine authors to whom the 雄伟的文件 were attributed served as the data pool for computational stylistic comparison.17 下图(图2)显示了这些“known author”选择了文档

- 点击图片放大 -

sensitivity to sameness 的 word and character count, genre, chronological era, and recipient. While the enterprise 的 authorship attribution by computational 语言学 methods does not require sameness 的 subject matter for document comparison, several 的 the “known author”文件包含类似的主题(例如,空间技术)。在某些情况下,“known author”文档引用了未经验证的文档之一中的某个事件(例如1942年洛杉矶的目击事件)。



Chaski博士解释说,当涉及法律界的文件归属时,确定作者身份的方法“必须与当前可用的标准调查和取证技术配合使用。”19 确定打字文档的作者身份,无论是最初还是随后以电子形式,都可以通过以下三种方法来确定:“。 。 。计算机用户的生物特征分析;定性分析‘idiosyncrasies’以有疑问和已知文件的语言;以及对有疑问和已知文件中语言的定量,计算风格分析。”20

With respect to the 雄伟的文件, the first method is not possible—无法分析实际的击键模式动态。此方法在技术上是非语言的。第二种方法“assesses errors and “idiosyncrasies”根据考官’s experience.”21 This method also has the disadvantage 的 requiring the pre-existence 的 a stylistic database against which to measure presumed 特质. Chaski elaborates:
正如McMenamin(2001)所建议的那样,这种称为法医风格的方法可以通过数据库量化,但是目前所需的数据库尚未完全开发。如果没有数据库来证明风格特征的重要性,审查员’对风格特征重要性的直觉会导致方法学的主观性和偏见。量化的另一种方法是计算特定的错误或特质,并将其输入到统计分类过程中。当Koppel和Schler(2000)以这种方式量化法医文体学方法时,使用100“stylemarkers”在支持向量机(Vapnik 1995)和C4.5(Quinlan 1993)的分析中,作者署名的最高准确度是72%。 22
第三种方法,风格学“是定量的和计算性的,着重于易于计算和可数的语言功能,例如单词长度,短语长度,句子长度,词汇频率,不同长度单词的分布。”23 风格分析还可以包括功能词频率和标点的分析。24

作为符合法律证据标准的作者身份归属技术开发领域的领导者之一,Chaski博士已经发展了“一种计算式测绘方法,已达到95%的准确度,并已成功用于调查和判决涉及数字证据的若干犯罪。”25 Chaski详细介绍了她的方法(ALIAS26):
[My]句法分析方法(Chaski 1997、2001、2004)的准确率达95%。句法分析方法与其他计算风格方法之间的主要区别是句法方法’s 语言学 sophistication and foundation in 语言学 theory. Typical stylometric features such as word length and sentence length are easy to compute even if not very interesting in terms 的 语言学 theory, but the more difficult to compute features such as phrasal type are also more theoretically grounded in 语言学 science and experimental psycholinguistics.27
As noted above (Sec. 1.3), with respect to the 雄伟的文件, Dr. Chaski’s testing was not as thorough as it could have been due to expense. Variations on the capabilities 的 ALIAS were employed to test the 雄伟的文件. The testing is therefore referred to as preliminary in this paper. Future testing will allow a full exploitation 的 the capabilities 的 ALIAS.

具体来说,Chaski博士在第一轮测试中采用的方法是“ngram”方法。 N-gram方法涉及顺序检测特定数量(n)的词性标签或单词的模式。一旦找到这些序列,就可以按照相似性对其进行排序。28 (Chaski,“Keyboard,”5)。关于她在该领域的开拓技术—which were used for testing the 雄伟的文件—Dr. Chaski noted:
One final word on the testing enterprise is necessary. It is acknowledged that many 的 the 雄伟的文件 were not handwritten or even typed by the author to whom they are attributed. The typical practice, especially for presidents, would be to verbally dictate the content 的 correspondence to a secretary who would type and reproduce the content. This reality is not at odds with Dr. Chaski’s testing methods since memoranda and correspondence are not be produced by distinct psycho-linguistic processes. In other words, there is no significant 语言学 difference between dictating a letter as one would desire it be written and the mental connection to the act 的 typing those thoughts oneself.



*特别感谢Michael S. Heiser博士

1).看到 the chronological listing 的 the reception 的 the 雄伟的文件 reconstructed by Dr. Robert Wood and Ryan 木,http://www.majesticdocuments.com/sources.php,网址为2007年6月5日。罗伯·伍德博士(Robert Wood)“MJ-12文件真实性的越来越多的证据,”在加利福尼亚州尔湾市国际MUFON研讨会上发表的论文; 2001年7月21日,5。访问于 在2007年6月5日。

2). Shandera was one 的 the early recipients 的 the 雄伟的文件.

3). 见伍德,“Mounting Evidence,” 6-10

4). 斯坦顿·弗里德曼’的网站传记部分内容如下:“Stanton Friedman分别于1955年和1956年在芝加哥大学获得物理学学士学位和理学硕士学位。他在GE,GM,Westinghouse,TRW Systems,Aerojet General Nucleonics和McDonnell Douglas等公司担任核物理学家长达14年。这些先进的,机密的,最终被取消的项目包括核飞机,裂变和聚变火箭以及太空核动力装置。” 在访问 www.v-j-enterprises.com/sfbio.html 在2007年6月5日。

5). 罗伯特伍德博士拥有理学学士学位在科罗拉多大学获得航空工程博士学位,并获得博士学位。康奈尔大学物理学博士学位。在1993年退休之前,他在道格拉斯飞机公司和麦道公司进行了43年的研发工作。瑞恩·伍德(Ryan Wood)持有学士学位。拥有加利福尼亚州立理工大学圣路易斯·奥比斯波分校数学和计算机科学专业的博士学位。他在英特尔公司,数字设备和东芝公司的市场营销,咨询和销售部门担任过各种职位。

6). 参见Stanton T.Friedman, 最高机密/ Majic (纽约:Marlowe and Company,1996);观念“威严十二行动的最终报告,”未发表的论文,1990年;木,“Mounting Evidence”; idem., “验证新的雄伟文件,”在密苏里州圣路易斯举行的国际MUFON研讨会上发表的论文; 2000年7月15日;罗伯特·M和瑞安·伍德,《再看雄伟》, 慕丰 飞碟杂志 第371号,1999年3月。

7). 木,“Mounting Evidence,” 6-7.

8).在访问 www.4n6.com 在2007年6月5日。

9). 木,“Mounting Evidence,” 9-10.

10). 看到 www.v-j-enterprises.com/mj12_update3.html#bottom 在2007年6月6日访问。

11). 看到 www.v-j-enterprises.com/mj12_update2.html#bottom 于2007年6月6日访问。

12). 看到 www.utc.edu/Research/SunTrustChair/chair_previous_wescott_index.html 于2007年6月6日访问。

13). 国际UFO记者,卷13号1988年7月/ 8月,第4页。 19.被保罗·金博尔(Paul Kimball)引用,“MJ-12 – The Wescott ‘Analysis’ Red Herring,” 真相的另一面,2005年7月14日访问: redstarfilms.blogspot.com/2005_07_01_archive.html 在2007年6月6日。

14). Dr. Chaski holds an M.A. and Ph.D. in 语言学s from Brown University. Computational 语言学s is one 的 her specialties, and her work in this field has been recognized and validated through peer review, numerous legal cases, and scientific grant funding. 看到 www.linguisticevidence.org/FLCV.htm。

15). 看到 www.majesticdocuments.com/documents/1948-1959.php (页面底部);于2007年6月9日访问。

16). 同上

17). 这些文件的编号和名称是Heiser博士发明的一种分类方法。的“known author”上方电子表格中的文档均来自大量可能的文档。

18). 卡罗尔E.查斯基,“Who’在键盘上?数字证据调查中的作者身份归属,”国际数字证据杂志4:1(2005年春季)。在线访问,2007年6月10日。

19). 查斯基,“Keyboard,” 1.

20). 同上

21). 同上

22). 同上,2.请参阅参考书目以获取Chaski引用的文章。

23). 同上2。

24). 参见Carol E. Chaski,“基于语言的作者识别技术的实证评估,”国际言论,语言和法律杂志8:1(2005):5;约翰·奥尔森“使用共同的文字特征组进行作者归属,”内布拉斯加州语言学研究所法证语言研究所(n.d。),1-10;在访问 www.thetext.co.uk/authorship/authorship.doc,2007年6月12日;迈克尔·加蒙“语言的风格相关:具有深层语言结构的作者分类,”微软研究院,微软公司;在访问 research.microsoft.com/nlp/publications/coling2004_authorship.pdf,2007年6月11日; Shlomo Argamon和Shlomo Levitan,“测量功能词对作者归属的有用性,”伊利诺伊理工学院;在访问 lingcog.iit.edu/doc/paper_162_argamon.pdf,2007年6月11日。

25). 同上,1。

26). ALIAS是Chaski博士编写的计算机程序。查斯基博士’目前,该方法已在美国专利局申请专利。

27). 同上,2.参见参考书目以获取所引用的作品。

28). 见Chaski,“Keyboard,” 5.

29). 2007年6月12日通过电子邮件与作者进行个人交流。

