datadesigncn.github.io/ch01.html at master · datadesigncn/datadesigncn.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html><html xmlns="http://www.w3.org/1999/xhtml" lang="en">
  <head>
    <meta charset="utf-8"/>
    <title>数据 + 设计</title>
    <link rel="stylesheet" type="text/css" href="theme/html/html.css"/>
<script src="js/retina.min.js" type="text/javascript"> </script>
<script src="js/jquery.min.js" type="text/javascript"> </script>
<script src="js/data-design.js" type="text/javascript"> </script>
    <meta name="viewport" content="width=device-width, initial-scale=1.0"/>
  </head>
  <body data-type="book">
    <span class="btn open">打开</span>
    <div class="navbar">
      <div class="title">
        <span class="btn close">关闭</span>
        <h1>数据 + 设计</h1>
        <h2>对信息准备与可视化的简要介绍</h2>
      </div>
      <nav data-type="toc" id="idp97216">
  <ol>
    <li data-type="part">
      <a href="titlepage01.html">简介</a>
      <ol>
        <li data-type="copyright-page">
          <a href="copyright-page01.html">版权许可</a>
        </li>
        <li data-type="preface">
          <a href="preface01.html">绪言</a>
        </li>
        <li data-type="foreword">
          <a href="foreword01.html">序</a>
        </li>
        <li data-type="introduction">
          <a href="introduction01.html">怎样使用本书</a>
        </li>
      </ol>
    </li>

    <li data-type="part">
      <a href="part01.html">数据基础</a>
      <ol>
        <li data-type="chapter">
          <a href="ch01.html">基本数据类型</a>
        </li>
        <li data-type="chapter">
          <a href="ch02.html">关于数据聚合/统计</a>
        </li>
      </ol>
    </li>

    <li data-type="part">
      <a href="part02.html">数据采集</a>
      <ol>
        <li data-type="chapter">
          <a href="ch03.html">调查数据简介</a>
        </li>

        <li data-type="chapter">
          <a href="ch04.html">调查问题类型</a>
        </li>

        <li data-type="chapter">
          <a href="ch05.html">其他的数据采集方法</a>
        </li>

        <li data-type="chapter">
          <a href="ch06.html">发现外部数据</a>
        </li>
      </ol>
    </li>

    <li data-type="part">
      <a href="part03.html">让数据就绪</a>
      <ol>
        <li data-type="chapter">
          <a href="ch07.html">数据准备</a>
        </li>

        <li data-type="chapter">
          <a href="ch08.html">数据清理</a>
        </li>

        <li data-type="chapter">
          <a href="ch09.html">数据校对种类</a>
        </li>

        <li data-type="chapter">
          <a href="ch10.html">数据清理的能和不能</a>
        </li>

        <li data-type="chapter">
          <a href="ch11.html">数据转换</a>
        </li>
      </ol>
    </li>

    <li data-type="part">
      <a href="part04.html">数据可视化</a>
      <ol>
        <li data-type="chapter">
          <a href="ch12.html">决定哪些以及多少数据用于呈现</a>
        </li>

        <li data-type="chapter">
          <a href="ch13.html">图形化调查响应结果</a>
        </li>

        <li data-type="chapter">
          <a href="ch14.html">解析信息图</a>
        </li>

        <li data-type="chapter">
          <a href="ch15.html">色彩、字体、图标的重要性</a>
        </li>

        <li data-type="chapter">
          <a href="ch16.html">打印 Vs. 网页，静态 Vs. 交互</a>
        </li>
      </ol>
    </li>

    <li data-type="part">
      <a href="part05.html">不要做什么</a>
      <ol>
        <li data-type="chapter">
          <a href="ch17.html">知觉欺骗</a>
        </li>

        <li data-type="chapter"><a href="ch18.html">常见可视化错误</a>
        </li>
      </ol>
    </li>

    <li data-type="part">
      <a href="#">总结</a>
      <ol>
        <li data-type="chapter">
          <a href="app01.html">资源</a>
        </li>
        <li data-type="chapter">
          <a href="glossary01.html">术语表</a>
        </li>
        <li data-type="chapter">
          <a href="acknowledgments01.html">贡献者/致谢</a>
        </li>
      </ol>
    </li>
  </ol>
</nav>

    </div>
    <section class="blue" data-type="chapter" data-pdf-bookmark="Chapter 19. Basic Data Types" id="idp4389472">
<header>
  <div class="icon"><img src="images/sections/02/abacus.png"/></div>
  <p>Chapter 1</p>
  <p>第一章</p>
  <h1 id="basic-data-types">Basic Data Types</h1>
  <h1 id="basic-data-types">基本数据类型</h1>
  <p data-type="author">作者：米歇尔·卡斯特罗</p>
</header>

<section data-type="sect1" id="idp4385824">
<p>There are several different basic data types and it’s important to know what you can do with each of them so you can collect your data in the most appropriate form for your needs. People describe data types in many ways, but we’ll primarily be using the levels of measurement known as nominal, ordinal, interval, and ratio.</p>
<p>了解一些基本的数据类型非常重要，尤其是当你需要以恰当的格式来处理你所需要搜集的数据。人们有很多描述数据类型的方式，但我们主要采用以下几种度量的级别，如：标称型、序数型、区间型和比率型。</p>

<h2>Levels of Measurement</h2>
<h2>度量级别</h2>

<p>Let’s say you’re on a trip to the grocery store. You move between sections of the store, placing items into your basket as you go. You grab some fresh produce, dairy, frozen foods, and canned goods. If you were to make a list that included what section of the store each item came from, this data would fall into the nominal type. The term nominal is related to the Latin word “nomen,” which means “pertaining to names;” we call this data nominal data because it consists of named categories into which the data fall. Nominal data is inherently unordered; produce as a general category isn’t mathematically greater or less than dairy.</p>
<p>让我们假设你正踏上通往杂货铺的旅途。你游走于商店不同的区域之间，边走边把商品放到你的购物篮中，你抓了一些新鲜农产品、乳制品、冷冻食品和罐装食品，如果你列了一张标明每件货物来自哪个区域的清单，这些数据将归为标称型，术语”标称”源于拉丁语“命名”，意为“准备起名字”。将这类数据称为“标称型数据”是因为它们包含了数据划分的命名类别，标称型数据是天然无序型的，农产品作为一个总类别在数学上并不比乳制品大或者小。</p>

<h3>Nominal</h3>
<h3>标称型</h3>

<p>Nominal data can be counted and used to calculate percents, but you can’t take the average of nominal data. It makes sense to talk about how many items in your basket are from the dairy section or what percent is produce, but you can’t calculate the average grocery section of your basket.</p>
<p>标称型数据可数并且可以计算百分比，但你无法求其平均数，你的购物篮中有多少商品，乳制品在其中占多大比例是有意义的，但你无法计算出你购物篮中来自商店不同区域的平均值。</p>

<p>When there are only two categories available, the data is referred to as dichotomous. The answers to yes/no questions are dichotomous data. If, while shopping, you collected data about whether an item was on sale or not, it would be dichotomous.</p>
<p>当只有两个类别可用时，数据称为二分类型，只有”是或否”之类判断题的答案为二分类型的数据，比如,当购物时，你收集商品是否打折出售的数据，这就是二分类型。</p>

<figure><img alt="购物篮比例" src="images/sections/02/ch02-01-percent-basket.png"/></figure>

<h3>Ordinal</h3>
<h3>序数型</h3>

<p>At last, you get to the checkout and try to decide which line will get you out of the store the quickest. Without actually counting how many people are in each queue, you roughly break them down in your mind into short lines, medium lines, and long lines. Because data like these have a natural ordering to the categories, it’s called ordinal data. Survey questions that have answer scales like “strongly disagree,” “disagree,” “neutral,” “agree,” “strongly agree” are collecting ordinal data. No category on an ordinal scale has a true mathematical value. Numbers are often assigned to the categories to make data entry or analysis easier (e.g. 1 = strongly disagree, 5 = strongly agree), but these assignments are arbitrary and you could choose any set of ordered numbers to represent the groups. For instance, you could just as easily decide to have 5 represent “strongly disagree” and 1 represent “strongly agree.”</p>
<p>最后，你准备结账并且选从哪个队列出店最快。没有确切数过每条队有多少人，你只是在头脑中粗略的将其分为短队、中队和长队。因为这类数据除了类别外还有含有天然的序列属性,所以叫序数型数据.比如问卷调查中要采集的答案”强烈反对”、“反对”、“中立”、“赞成”、“强烈赞成”等序数型数据。类别在序数规模上并没有实在的数量含义.通常为这些类别赋予数值,以方便数据录入和分析(如1=强烈反对,5=强烈赞成),但此类赋值是主观的,你可以选用任何有序数据集合来表示这个组合,比如你也可以简单的用5来表示”强烈反对”而用1来表示”强烈赞成”.</p>

<aside data-type="sidebar" id="idp4405568">
  <p>The numbers you select to represent ordinal categories do change the way you interpret your end analysis, but you can choose any set you wish as long as you keep the numbers in order.</p>
  <p>你所选用代表序号类别的数字会影响你的最终分析结果，但你可以选择任何你喜欢的集合,前提是保持数字有序。</p>

  <p>It is most common to use either 0 or 1 as the starting point.</p>
  <p>最常见的是使用0或1为起点。</p>
</aside>


<figure><img alt="标量" src="images/sections/02/number-order.png"/></figure>

<p>Like nominal data, you can count ordinal data and use them to calculate percents, but there is some disagreement about whether you can average ordinal data. On the one hand, you can’t average named categories like “strongly agree” and even if you assign numeric values, they don’t have a true mathematical meaning. Each numeric value represents a particular category, rather than a count of something.</p>
<p>像标称数据，你可以数出有序数据个数并用它们来计算出百分比，但在能否计算有序数据的平均值时仍存在一些分歧。一方面，你不能均分像“强烈同意”的命名类别，即使你已经赋值，他们也没有真正的数学意义。相对于计数功能，每个数值更确切的说是代表一个特定的类别。</p>

<p>On the other hand, if the difference in degree between consecutive categories on the scale is assumed to be approximately equal (e.g. the difference between strongly disagree and disagree is the same as between disagree and neutral, and so on) and consecutive numbers are used to represent the categories, then the average of the responses can also be interpreted with regard to that same scale.</p>
<p>另一方面，如果在类别之间的连续程度上的差异被认为是近似相等的（例如”强烈反对”和”反对”的差异,与”反对”和”中立”之间的差异相同）连续的数字用来表示类别，同一标度下差值可以用平均数来解读。</p>

<aside data-type="sidebar" id="idp4413712">
Some fields strongly discourage the use of ordinal data to do calculations like this, while others consider it common practice. You should look at other work in your field to see what usual procedures are.
</aside>
<aside data-type="sidebar" id="idp4416144">
一些领域强烈排斥使用有序数据做计算，而在另一些领域却是常见的做法。你应该参照你所在领域内的其他人的惯例。
</aside>

<h3>Interval</h3>
<h3>区间型</h3>

<p>Enough ordinal data for the moment… back to the store! You’ve been waiting in line for what seems like a while now, and you check your watch for the time. You got in line at 11:15am and it’s now 11:30. Time of day falls into the class of data called interval data, so named because the interval between each consecutive point of measurement is equal to every other. Because every minute is sixty seconds, the difference between 11:15 and 11:30 has the exact same value as the difference between 12:00 and 12:15.</p>
<p>此刻你有足够的有序数据..回到商店来！你现在排队等候结账已经有一会儿了，你看了下表。你开始排队时是在上午11:15，现在是上午11:30，一天中的时间可以划分为区间数据的类型，这样命名是因为每个连续的点之间的区间相等，因为每分钟有60秒，11:15到11:30的差和12:00到12:15的差是完全相等的。</p>

<p>Interval data is numeric and you can do mathematical operations on it, but it doesn’t have a “meaningful” zero point – that is, the value of zero doesn’t indicate the absence of the thing you’re measuring. 0:00 am isn’t the absence of time, it just means it’s the start of a new day. Other interval data that you encounter in everyday life are calendar years and temperature. A value of zero for years doesn’t mean that time didn’t exist before that, and a temperature of zero (when measured in C or F) doesn’t mean there’s no heat.</p>
<p>区间数据是数值型的，你可以对其进行数学操作，但没有实际意义上的零值-0并不表示空缺，当测量值是0:00 am时并非是指不存在的值，而是说这是新的一天的起点，其他你在日常生活会遇到的区间数据是日历、年和温度。纪年中的0年并不意味着那个时间点不存在，温度中的0度（当用摄氏或华氏度量时）也不意味着没有热度。</p>

<h3>Ratio</h3>
<h3>比率型</h3>

<p>Seeing that the time is 11:30, you think to yourself, “I’ve been in line for fifteen minutes already…???” When you start thinking about the time this way, it’s considered ratio data. Ratio data is numeric and a lot like interval data, except it <em>does</em> have a meaningful zero point. In ratio data, a value of zero indicates an absence of whatever you’re measuring—zero minutes, zero people in line, zero dairy products in your basket. In all these cases, zero actually means you don’t have any of that thing, which differs from the data we discussed in the interval section. Some other frequently encountered variables that are often recorded as ratio data are height, weight, age, and money.</p>
<p>看到时间已经是11:30，你会自言自语“我已经排队15分钟了…???”当你这样思考时间的话，就是看作比率数据.比率数据是数值型的,跟区间数据很相似,除了它(区间型)包含确实有实在意义的零值.在比率数据中,0值意味着没有你要测量的东西-0分钟,队列中0人,购物篮中有0个乳制品,0确切的意味着你没有任何东西,这与区间型数据中谈论的0是有区别的.其它经常遇到的作为比率型数据记录的变量有重量,年龄和货币.</p>

<p>Interval and ratio data can be either discrete or continuous. Discrete means that you can only have specific amounts of the thing you are measuring (typically integers) and no values in between those amounts. There have to be a whole number of people in line; there can’t be a third of a person. You can have an <em>average</em> of, say, 4.25 people per line, but the actual count of people has to be a whole number. Continuous means that the data can be any value along the scale. You can buy 1.25 lbs of cheese or be in line for 7.75 minutes. This doesn’t mean that the data have to be able to take all possible numerical values – only all the values within the bounds of the scale. You can’t be in line for a negative amount of time and you can’t buy negative lbs of cheese, but these are still continuous.</p>
<p>区间型和比率型数据可以是离散型或连续型的.离散意味着你测量的东西只能有特定数量的值(尤其是整数的)没有在此之间的值,比如队列中 只能有整数个人,不可能出现1/3人,你可以计算平均值,比如说每队平均4.25人,但准确的人数必须为整数,连续值意味着取值可以是区间内的任何一个数,你可以购买1.25磅的乳酪或是排队7.75分钟,但这不意味着数据能采用所有可能的数值,只有在特定范围边界之内的数据才行.你排队的时间不可能为负值.也不可能购买负几磅的乳酪,但它们仍然是连续型的.</p>

<aside data-type="sidebar" id="idp4415344">
For simplicity in presentation, we often round continuous data to a certain number of digits. These data are still continuous, not discrete.
</aside>
<aside data-type="sidebar" id="idp4432784">
为了方便展示，我们通常将连续型数据约入一些特定的整数值，但这些数据仍是连续的，不是离散的。
</aside>

<p>To review, let’s take a look at a receipt from the store. Can you identify which pieces of information are measured at each level (nominal, ordinal, interval, and ratio)?</p>
<p>回顾一下，让我们看下商店的收据，你能识别出下面的信息分别都是那些度量级别的么?（标称、序数、区间和比率）？</p>

<table>
	<tbody>
		<tr>
			<th colspan="5">Date: 06/01/2014 Time: 11:32am</th>
		</tr>
		<tr>
			<th>Item</th>
			<th>Section</th>
			<th>Aisle</th>
			<th>Quantity</th>
			<th>Cost (US$)</th>
		</tr>
		<tr>
			<td>Oranges—Lbs</td>
			<td>Produce</td>
			<td>4</td>
			<td>2</td>
			<td>2.58</td>
		</tr>
		<tr>
			<td>Apples—Lbs</td>
			<td>Produce</td>
			<td>4</td>
			<td>1</td>
			<td>1.29</td>
		</tr>
		<tr>
			<td>Mozzarella—Lbs</td>
			<td>Dairy</td>
			<td>7</td>
			<td>1</td>
			<td>3.49</td>
		</tr>
		<tr>
			<td>Milk—Skim—Gallon</td>
			<td>Dairy</td>
			<td>8</td>
			<td>1</td>
			<td>4.29</td>
		</tr>
		<tr>
			<td>Peas—Bag</td>
			<td>Frozen</td>
			<td>15</td>
			<td>1</td>
			<td>0.99</td>
		</tr>
		<tr>
			<td>Green Beans—Bag</td>
			<td>Frozen</td>
			<td>15</td>
			<td>3</td>
			<td>1.77</td>
		</tr>
		<tr>
			<td>Tomatoes</td>
			<td>Canned</td>
			<td>2</td>
			<td>4</td>
			<td>3.92</td>
		</tr>
		<tr>
			<td>Potatoes</td>
			<td>Canned</td>
			<td>3</td>
			<td>2</td>
			<td>2.38</td>
		</tr>
		<tr>
			<td>Mushrooms</td>
			<td>Canned</td>
			<td>2</td>
			<td>5</td>
			<td>2.95</td>
		</tr>
	</tbody>
</table>

<h2>Variable Type Vs. Data Type</h2>
<h2>变量类型 vs 数据类型</h2>

<p>If you look around the internet or in textbooks for info about data, you’ll often find variables described as being one of the data types listed above. Be aware that many variables aren’t exclusively one data type or another. What often determines the data type is how the data are collected.</p>
<p>如果你在网上或教科书上查找关于数据的信息，你经常发现变量被描述成上述的一种数据类型。注意许多变量并不只一种类型或有其它类型，经常决定数据类型的是数据怎样收集的。</p>

<p>Consider the variable age. Age is frequently collected as ratio data, but can also be collected as ordinal data. This happens on surveys when they ask, “What age group do you fall in?” There, you wouldn’t have data on your respondent’s individual ages – you’d only know how many were between 18-24, 25-34, etc. You might collect actual cholesterol measurements from participants for a health study, or you may simply ask if their cholesterol is high. Again, this is a single variable with two different data collection methods and two different data types.</p>
<p>考虑年龄变量。年龄是经常收集的比率数据，但也可以作为序数数据来收集，这种情况会发生在问卷调查时，他们问“你属于哪个年龄段？”这样，你就不会有调查对象个体的年龄，你只知道有多少在18-24,25-34之间等。你可能为了健康研究收集调查对象的具体胆固醇值，也可能只问他们是否高胆固醇。再者说，这是单变量用两种不同的数据收集方法，两种不同的数据类型。</p>

<p>The general rule is that you can go down in level of measurement but not up. If it’s possible to collect the variable as interval or ratio data, you can also collect it as nominal or ordinal data, but if the variable is inherently only nominal in nature, like grocery store section, you can’t capture it as ordinal, interval or ratio data. Variables that are naturally ordinal can’t be captured as interval or ratio data, but can be captured as nominal. However, many variables that get captured as ordinal have a similar variable that can be captured as interval or ratio data, if you so choose.</p>
<p>通用的规则是，你可以采用细化的度量级别而不是抽象的。如果可能需要收集区间型或比率型数据的变量，你也能以标称型或序数数据来收集。但如果变量本身只是名义上的性质，比如杂货店区域，你就不能把它作为序数、区间或比率型数据来获取。天然有序的变量就不能作为区间或比率数据来获取，但可以作为标称数据来获取。然而，如果可以选择的话,许多作为序数型的变量在获取时仍可以采用作为区间或比率型数据来获取。</p>

<table>
	<tbody>
		<tr>
			<th>Ordinal Level Type</th>
			<th>Corresponding Interval/Ratio Level Measure</th>
			<th>Example</th>
		</tr>
		<tr>
			<td>Ranking</td>
			<td>Measurement that ranking is based on</td>
			<td>Record runners’ marathon times instead of what place they finish</td>
		</tr>
		<tr>
			<td>Grouped scale</td>
			<td>Measurement itself</td>
			<td>Record exact age instead of age category</td>
		</tr>
		<tr>
			<td>Substitute scale</td>
			<td>Original measurement the scale was created from</td>
			<td>Record exact test score instead of letter grade</td>
		</tr>
	</tbody>
</table>

<p>It’s important to remember that the general rule of “you can go down, but not up” also applies during analysis and visualization of your data. If you collect a variable as ratio data, you can always decide later to group the data for display if that makes sense for your work. If you collect it as a lower level of measurement, you can’t go back up later on without collecting more data. For example, if you do decide to collect age as ordinal data, you can’t calculate the average age later on and your visualization will be limited to displaying age by groups; you won’t have the option to display it as continuous data.</p>
<p>记住通用规则”细化而非抽象”很重要,通用可以用在分析和可视化数据工作上.如果你以比率型数据来收集变量,之后还可以分组的形式展示数据.但如果你采集的数据测量级别很,那么除了回过头来收集更多的数据,你不可能有更深的进展.比如你以序数型数据来收集年龄信息,那过后你不可能计算出平均值,并且可视化时只限制在显示年龄组信息,你不可能把它作为连续的数据展示.</p>

<p>When it doesn’t increase the burden of data collection, you should collect the data at the highest level of measurement that you think you might want available later on. There’s little as disappointing in data work as going to do a graph or calculation only to realize you didn’t collect the data in a way that allows you to generate what you need!</p>
<p>在不增加采集数据负担的前提下,你应当以今后可能用到的数据的最高级别方式来收集.以免当要绘图或计算时才失望的发现数据有问题,那时才意识到你收集数据的方式并不能生成你想要的结果.</p>

<h2>Other Important Terms</h2>
<h2>其他重要术语</h2>

<p>There are some other terms that are frequently used to talk about types of data. We are choosing not to use them here because there is some disagreement about their meanings, but you should be aware of them and what their possible definitions are in case you encounter them in other resources.</p>
<p>还有一些其他的经常谈论数据类型的术语.我们这里并未采用,因为它们的含义有些分歧但你仍应该注意它们和可能的定义,以防你在其他材料中遇到.</p>

<h3>Categorical Data</h3>
<h3>分类型数据</h3>

<p>We talked about both nominal and ordinal data above as splitting data into categories. Some texts consider both to be types of categorical data, with nominal being unordered categorical data and ordinal being ordered categorical data. Others only call nominal data categorical, and use the terms “nominal data” and “categorical data” interchangeably. These texts just call ordinal data “ordinal data” and consider it to be a separate group altogether.</p>
<p>用于把数据分成类别时,我们谈到了标称型和有序型数据时,一些材料把他们都看做是分类数据,标称型作为无序分类数据,序数型作为有序分类数据.其它的只成为标称数据类别,用”分类数据”来替换”标称数据”.有些材料把序数数据叫”有序数据”,把它作为完全不同的一个组。</p>

<h3>Qualitative and Quantitative Data</h3>
<h3>定性和定量数据</h3>

<p>Qualitative data, roughly speaking, refers to non-numeric data, while quantitative data is typically data that is numeric and hence quantifiable. There is some consensus with regard to these terms. Certain data are always considered qualitative, as they require pre-processing or different methods than quantitative data to analyze. Examples are recordings of direct observation or transcripts of interviews. In a similar way, interval and ratio data are always considered to be quantitative, as they are only ever numeric. The disagreement comes in with the nominal and ordinal data types. Some consider them to be qualitative, since their categories are descriptive and not truly numeric. However, since these data can be counted and used to calculate percentages, some consider them to be quantitative, since they are in that way quantifiable.</p>
<p>定性数据，粗略地讲，是指非数值数据，而定量数据通常是数值型的数据，因此可以量化的。这是关于这些术语的一些共识。一些数据往往是定性的，它们需要预处理或用不同于定量数据的方法来分析。例子如直接观察的记录或采访录音脚本。类似的方式，区间型和比率型的数据一直被认为是定量的，因为他们是纯数值型的。分歧来自标称型和序数型的数据类型,有些人认为它们是定性的，因为他们的类别是描述性的，而不是真正的数值。然而，由于这些数据可以被计数和用于计算百分比，有些人认为它们是定量的，因为他们是能一定方式量化的。</p>

<p>To avoid confusion, we’ll be sticking with the level of measurement terms above throughout the rest of this book, except in our discussion of long-form qualitative data in the survey design chapter. If you come across terms “categorical,” “qualitative data,” or “quantitative data” in other resources or in your work, make sure you know which definition is being used and don’t just assume!</p>
<p>为了避免混淆，我们将在本书中始终使用以上的度量级别，除了我们在调查设计那章谈论到的长期形成的定性数据。如果你在其他材料或工作中遇到术语”分类的”，“定性数据，”或“定量数据”，确保您知道它们使用的真实定义,不要只是你以为！</p>

</section>
</section>
    <div class="navigation">
      <ul>
        <li id="next_page"><a href="ch02.html">Next</a></li>
        <li id="previous_page"><a href="part01.html">Previous</a></li>
      </ul>
    </div>

    <script>
      var _hmt = _hmt || [];
      (function() {
        var hm = document.createElement("script");
        hm.src = "//hm.baidu.com/hm.js?27111432badf47d1f6260dcd3c815289";
        var s = document.getElementsByTagName("script")[0];
        s.parentNode.insertBefore(hm, s);
      })();
    </script>
  </body>
</html>