日日夜夜天天,色噜噜视频在线观看

作者：lemonbit

from Unsplash by @Mike Enerio

翻譯 | Lemon

來源 | Machine Learning Plus

23 直方密度線圖（Density Curves with Histogram）

帶有直方圖的密度曲線匯集了兩個(gè)圖所傳達(dá)的集體信息，因此您可以將它們放在一個(gè)圖中而不是兩個(gè)圖中。

# Import Data
df = pd.read_csv('https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv')
# Draw Plot
plt.figure(figsize=(13,10), dpi= 80)
sns.distplot(df.loc[df['class'] == 'compact', 'cty'], color='dodgerblue', label='Compact', hist_kws={'alpha':.7}, kde_kws={'linewidth':3})
sns.distplot(df.loc[df['class'] == 'suv', 'cty'], color='orange', label='SUV', hist_kws={'alpha':.7}, kde_kws={'linewidth':3})
sns.distplot(df.loc[df['class'] == 'minivan', 'cty'], color='g', label='minivan', hist_kws={'alpha':.7}, kde_kws={'linewidth':3})
plt.ylim(0, 0.35)
# Decoration
plt.title('Density Plot of City Mileage by Vehicle Type', fontsize=22)
plt.legend()
plt.show()

圖23

24 Joy Plot

Joy Plot允許不同組的密度曲線重疊，這是一種可視化大量分組數(shù)據(jù)的彼此關(guān)系分布的好方法。它看起來很悅目，并清楚地傳達(dá)了正確的信息。它可以使用基于 matplotlib 的 joypy 包輕松構(gòu)建。（『Python數(shù)據(jù)之道』注：需要安裝 joypy 庫(kù)）

# !pip install joypy
# Python數(shù)據(jù)之道 備注
import joypy

# Import Data
mpg = pd.read_csv('https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv')

# Draw Plot
plt.figure(figsize=(16,10), dpi= 80)
fig, axes = joypy.joyplot(mpg, column=['hwy', 'cty'], by='class', ylim='own', figsize=(14,10))

# Decoration
plt.title('Joy Plot of City and Highway Mileage by Class', fontsize=22)
plt.show()

圖24

25 分布式包點(diǎn)圖（Distributed Dot Plot）

分布式包點(diǎn)圖顯示按組分割的點(diǎn)的單變量分布。點(diǎn)數(shù)越暗，該區(qū)域的數(shù)據(jù)點(diǎn)集中度越高。通過對(duì)中位數(shù)進(jìn)行不同著色，組的真實(shí)定位立即變得明顯。

圖25

26 箱形圖（Box Plot）

箱形圖是一種可視化分布的好方法，記住中位數(shù)、第25個(gè)第45個(gè)四分位數(shù)和異常值。但是，您需要注意解釋可能會(huì)扭曲該組中包含的點(diǎn)數(shù)的框的大小。因此，手動(dòng)提供每個(gè)框中的觀察數(shù)量可以幫助克服這個(gè)缺點(diǎn)。

例如，左邊的前兩個(gè)框具有相同大小的框，即使它們的值分別是5和47。因此，寫入該組中的觀察數(shù)量是必要的。

# Import Data
df = pd.read_csv('https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv')
# Draw Plot
plt.figure(figsize=(13,10), dpi= 80)
sns.boxplot(x='class', y='hwy', data=df, notch=False)
# Add N Obs inside boxplot (optional)
def add_n_obs(df,group_col,y):
medians_dict = {grp[0]:grp[1][y].median() for grp in df.groupby(group_col)}
xticklabels = [x.get_text() for x in plt.gca().get_xticklabels()]
n_obs = df.groupby(group_col)[y].size().values
for (x, xticklabel), n_ob in zip(enumerate(xticklabels), n_obs):
plt.text(x, medians_dict[xticklabel]*1.01, '#obs : '+str(n_ob), horizontalalignment='center', fontdict={'size':14}, color='white')
add_n_obs(df,group_col='class',y='hwy')
# Decoration
plt.title('Box Plot of Highway Mileage by Vehicle Class', fontsize=22)
plt.ylim(10, 40)
plt.show()

圖26

27 包點(diǎn)+箱形圖（Dot + Box Plot）

包點(diǎn)+箱形圖（Dot + Box Plot）傳達(dá)類似于分組的箱形圖信息。此外，這些點(diǎn)可以了解每組中有多少數(shù)據(jù)點(diǎn)。

# Import Data
df = pd.read_csv('https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv')

# Draw Plot
plt.figure(figsize=(13,10), dpi= 80)
sns.boxplot(x='class', y='hwy', data=df, hue='cyl')
sns.stripplot(x='class', y='hwy', data=df, color='black', size=3, jitter=1)

for i in range(len(df['class'].unique())-1):
    plt.vlines(i+.5, 10, 45, linestyles='solid', colors='gray', alpha=0.2)

# Decoration
plt.title('Box Plot of Highway Mileage by Vehicle Class', fontsize=22)
plt.legend(title='Cylinders')
plt.show()

圖27

28 小提琴圖（Violin Plot）

小提琴圖是箱形圖在視覺上令人愉悅的替代品。小提琴的形狀或面積取決于它所持有的觀察次數(shù)。但是，小提琴圖可能更難以閱讀，并且在專業(yè)設(shè)置中不常用。

# Import Data
df = pd.read_csv('https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv')
# Draw Plot
plt.figure(figsize=(13,10), dpi= 80)
sns.violinplot(x='class', y='hwy', data=df, scale='width', inner='quartile')
# Decoration
plt.title('Violin Plot of Highway Mileage by Vehicle Class', fontsize=22)
plt.show()

圖28

29 人口金字塔（Population Pyramid）

人口金字塔可用于顯示由數(shù)量排序的組的分布。或者它也可以用于顯示人口的逐級(jí)過濾，因?yàn)樗谙旅嬗糜陲@示有多少人通過營(yíng)銷渠道的每個(gè)階段。

# Read data
df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/email_campaign_funnel.csv')

# Draw Plot
plt.figure(figsize=(13,10), dpi= 80)
group_col = 'Gender'
order_of_bars = df.Stage.unique()[::-1]
colors = [plt.cm.Spectral(i/float(len(df[group_col].unique())-1)) for i in range(len(df[group_col].unique()))]

for c, group in zip(colors, df[group_col].unique()):
    sns.barplot(x='Users', y='Stage', data=df.loc[df[group_col]==group, :], order=order_of_bars, color=c, label=group)

# Decorations    
plt.xlabel('$Users$')
plt.ylabel('Stage of Purchase')
plt.yticks(fontsize=12)
plt.title('Population Pyramid of the Marketing Funnel', fontsize=22)
plt.legend()
plt.show()

圖29

30 分類圖（Categorical Plots）

由 seaborn庫(kù) 提供的分類圖可用于可視化彼此相關(guān)的2個(gè)或更多分類變量的計(jì)數(shù)分布。

# Load Dataset
titanic = sns.load_dataset('titanic')
# Plot
g = sns.catplot('alive', col='deck', col_wrap=4,
data=titanic[titanic.deck.notnull()],
kind='count', height=3.5, aspect=.8,
palette='tab20')
fig.suptitle('sf')
plt.show()

圖30

# Load Dataset
titanic = sns.load_dataset('titanic')

# Plot
sns.catplot(x='age', y='embark_town',
            hue='sex', col='class',
            data=titanic[titanic.embark_town.notnull()],
            orient='h', height=5, aspect=1, palette='tab10',
            kind='violin', dodge=True, cut=0, bw=.2)

圖30-2

五、組成（Composition）

31 華夫餅圖（Waffle Chart）

可以使用 pywaffle包創(chuàng)建華夫餅圖，并用于顯示更大群體中的組的組成。

（『Python數(shù)據(jù)之道』注：需要安裝 pywaffle 庫(kù)）

#! pip install pywaffle
# Reference: https://stackoverflow.com/questions/41400136/how-to-do-waffle-charts-in-python-square-piechart
from pywaffle import Waffle
# Import
df_raw = pd.read_csv('https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv')
# Prepare Data
df = df_raw.groupby('class').size().reset_index(name='counts')
n_categories = df.shape[0]
colors = [plt.cm.inferno_r(i/float(n_categories)) for i in range(n_categories)]
# Draw Plot and Decorate
fig = plt.figure(
FigureClass=Waffle,
plots={
'111': {
'values': df['counts'],
'labels': ['{0} ({1})'.format(n[0], n[1]) for n in df[['class', 'counts']].itertuples()],
'legend': {'loc': 'upper left', 'bbox_to_anchor': (1.05, 1), 'fontsize': 12},
'title': {'label': '# Vehicles by Class', 'loc': 'center', 'fontsize':18}
},
},
rows=7,
colors=colors,
figsize=(16, 9)
)

圖31

圖31-2

32 餅圖（Pie Chart）

餅圖是顯示組成的經(jīng)典方式。然而，現(xiàn)在通常不建議使用它，因?yàn)轲W餅部分的面積有時(shí)會(huì)變得誤導(dǎo)。因此，如果您要使用餅圖，強(qiáng)烈建議明確記下餅圖每個(gè)部分的百分比或數(shù)字。

# Import
df_raw = pd.read_csv('https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv')

# Prepare Data
df = df_raw.groupby('class').size()

# Make the plot with pandas
df.plot(kind='pie', subplots=True, figsize=(8, 8))
plt.title('Pie Chart of Vehicle Class - Bad')
plt.ylabel('')
plt.show()

圖32

圖32-2

33 樹形圖（Treemap）

樹形圖類似于餅圖，它可以更好地完成工作而不會(huì)誤導(dǎo)每個(gè)組的貢獻(xiàn)。

（『Python數(shù)據(jù)之道』注：需要安裝 squarify 庫(kù)）

# pip install squarify
import squarify
# Import Data
df_raw = pd.read_csv('https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv')
# Prepare Data
df = df_raw.groupby('class').size().reset_index(name='counts')
labels = df.apply(lambda x: str(x[0]) + ' (' + str(x[1]) + ')', axis=1)
sizes = df['counts'].values.tolist()
colors = [plt.cm.Spectral(i/float(len(labels))) for i in range(len(labels))]
# Draw Plot
plt.figure(figsize=(12,8), dpi= 80)
squarify.plot(sizes=sizes, label=labels, color=colors, alpha=.8)
# Decorate
plt.title('Treemap of Vechile Class')
plt.axis('off')
plt.show()

圖33

34 條形圖（Bar Chart）

條形圖是基于計(jì)數(shù)或任何給定指標(biāo)可視化項(xiàng)目的經(jīng)典方式。在下面的圖表中，我為每個(gè)項(xiàng)目使用了不同的顏色，但您通?？赡芟Ｍ麨樗许?xiàng)目選擇一種顏色，除非您按組對(duì)其進(jìn)行著色。顏色名稱存儲(chǔ)在下面代碼中的all_colors中。您可以通過在 plt.plot（）中設(shè)置顏色參數(shù)來更改條的顏色。

import random

# Import Data
df_raw = pd.read_csv('https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv')

# Prepare Data
df = df_raw.groupby('manufacturer').size().reset_index(name='counts')
n = df['manufacturer'].unique().__len__()+1
all_colors = list(plt.cm.colors.cnames.keys())
random.seed(100)
c = random.choices(all_colors, k=n)

# Plot Bars
plt.figure(figsize=(16,10), dpi= 80)
plt.bar(df['manufacturer'], df['counts'], color=c, width=.5)
for i, val in enumerate(df['counts'].values):
    plt.text(i, val, float(val), horizontalalignment='center', verticalalignment='bottom', fontdict={'fontweight':500, 'size':12})

# Decoration
plt.gca().set_xticklabels(df['manufacturer'], rotation=60, horizontalalignment= 'right')
plt.title('Number of Vehicles by Manaufacturers', fontsize=22)
plt.ylabel('# Vehicles')
plt.ylim(0, 45)
plt.show()

圖34

六、變化（Change）

35 時(shí)間序列圖（Time Series Plot）

時(shí)間序列圖用于顯示給定度量隨時(shí)間變化的方式。在這里，您可以看到 1949年至 1969年間航空客運(yùn)量的變化情況。

# Import Data
df = pd.read_csv('https://github.com/selva86/datasets/raw/master/AirPassengers.csv')
# Draw Plot
plt.figure(figsize=(16,10), dpi= 80)
plt.plot('date', 'traffic', data=df, color='tab:red')
# Decoration
plt.ylim(50, 750)
xtick_location = df.index.tolist()[::12]
xtick_labels = [x[-4:] for x in df.date.tolist()[::12]]
plt.xticks(ticks=xtick_location, labels=xtick_labels, rotation=0, fontsize=12, horizontalalignment='center', alpha=.7)
plt.yticks(fontsize=12, alpha=.7)
plt.title('Air Passengers Traffic (1949 - 1969)', fontsize=22)
plt.grid(axis='both', alpha=.3)
# Remove borders
plt.gca().spines['top'].set_alpha(0.0)
plt.gca().spines['bottom'].set_alpha(0.3)
plt.gca().spines['right'].set_alpha(0.0)
plt.gca().spines['left'].set_alpha(0.3)
plt.show()

圖35

36 帶波峰波谷標(biāo)記的時(shí)序圖（Time Series with Peaks and Troughs Annotated）

下面的時(shí)間序列繪制了所有峰值和低谷，并注釋了所選特殊事件的發(fā)生。

圖36

37 自相關(guān)和部分自相關(guān)圖（Autocorrelation (ACF) and Partial Autocorrelation (PACF) Plot）

自相關(guān)圖（ACF圖）顯示時(shí)間序列與其自身滯后的相關(guān)性。每條垂直線（在自相關(guān)圖上）表示系列與滯后0之間的滯后之間的相關(guān)性。圖中的藍(lán)色陰影區(qū)域是顯著性水平。那些位于藍(lán)線之上的滯后是顯著的滯后。

那么如何解讀呢？

對(duì)于空乘旅客，我們看到多達(dá)14個(gè)滯后跨越藍(lán)線，因此非常重要。這意味著，14年前的航空旅客交通量對(duì)今天的交通狀況有影響。

PACF在另一方面顯示了任何給定滯后（時(shí)間序列）與當(dāng)前序列的自相關(guān)，但是刪除了滯后的貢獻(xiàn)。

from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

# Import Data
df = pd.read_csv('https://github.com/selva86/datasets/raw/master/AirPassengers.csv')

# Draw Plot
fig, (ax1, ax2) = plt.subplots(1, 2,figsize=(16,6), dpi= 80)
plot_acf(df.traffic.tolist(), ax=ax1, lags=50)
plot_pacf(df.traffic.tolist(), ax=ax2, lags=20)

# Decorate
# lighten the borders
ax1.spines['top'].set_alpha(.3); ax2.spines['top'].set_alpha(.3)
ax1.spines['bottom'].set_alpha(.3); ax2.spines['bottom'].set_alpha(.3)
ax1.spines['right'].set_alpha(.3); ax2.spines['right'].set_alpha(.3)
ax1.spines['left'].set_alpha(.3); ax2.spines['left'].set_alpha(.3)

# font size of tick labels
ax1.tick_params(axis='both', labelsize=12)
ax2.tick_params(axis='both', labelsize=12)
plt.show()

圖37

38 交叉相關(guān)圖（Cross Correlation plot）

交叉相關(guān)圖顯示了兩個(gè)時(shí)間序列相互之間的滯后。

圖38

39 時(shí)間序列分解圖（Time Series Decomposition Plot）

時(shí)間序列分解圖顯示時(shí)間序列分解為趨勢(shì)，季節(jié)和殘差分量。

from statsmodels.tsa.seasonal import seasonal_decompose
from dateutil.parser import parse
# Import Data
df = pd.read_csv('https://github.com/selva86/datasets/raw/master/AirPassengers.csv')
dates = pd.DatetimeIndex([parse(d).strftime('%Y-%m-01') for d in df['date']])
df.set_index(dates, inplace=True)
# Decompose
result = seasonal_decompose(df['traffic'], model='multiplicative')
# Plot
plt.rcParams.update({'figure.figsize': (10,10)})
result.plot().suptitle('Time Series Decomposition of Air Passengers')
plt.show()

圖39

40 多個(gè)時(shí)間序列（Multiple Time Series）

您可以繪制多個(gè)時(shí)間序列，在同一圖表上測(cè)量相同的值，如下所示。

圖40

41 使用輔助 Y 軸來繪制不同范圍的圖形（Plotting with different scales using secondary Y axis）

如果要顯示在同一時(shí)間點(diǎn)測(cè)量?jī)蓚€(gè)不同數(shù)量的兩個(gè)時(shí)間序列，則可以在右側(cè)的輔助Y軸上再繪制第二個(gè)系列。

圖41

42 帶有誤差帶的時(shí)間序列（Time Series with Error Bands）

如果您有一個(gè)時(shí)間序列數(shù)據(jù)集，每個(gè)時(shí)間點(diǎn)（日期/時(shí)間戳）有多個(gè)觀測(cè)值，則可以構(gòu)建帶有誤差帶的時(shí)間序列。您可以在下面看到一些基于每天不同時(shí)間訂單的示例。另一個(gè)關(guān)于45天持續(xù)到達(dá)的訂單數(shù)量的例子。

在該方法中，訂單數(shù)量的平均值由白線表示。并且計(jì)算95％置信區(qū)間并圍繞均值繪制。

圖42

圖42-2

43 堆積面積圖（Stacked Area Chart）

堆積面積圖可以直觀地顯示多個(gè)時(shí)間序列的貢獻(xiàn)程度，因此很容易相互比較。

圖43

44 未堆積的面積圖（Area Chart UnStacked）

未堆積面積圖用于可視化兩個(gè)或更多個(gè)系列相對(duì)于彼此的進(jìn)度（起伏）。在下面的圖表中，您可以清楚地看到隨著失業(yè)中位數(shù)持續(xù)時(shí)間的增加，個(gè)人儲(chǔ)蓄率會(huì)下降。未堆積面積圖表很好地展示了這種現(xiàn)象。

# Import Data
df = pd.read_csv('https://github.com/selva86/datasets/raw/master/economics.csv')

# Prepare Data
x = df['date'].values.tolist()
y1 = df['psavert'].values.tolist()
y2 = df['uempmed'].values.tolist()
mycolors = ['tab:red', 'tab:blue', 'tab:green', 'tab:orange', 'tab:brown', 'tab:grey', 'tab:pink', 'tab:olive']      
columns = ['psavert', 'uempmed']

# Draw Plot
fig, ax = plt.subplots(1, 1, figsize=(16,9), dpi= 80)
ax.fill_between(x, y1=y1, y2=0, label=columns[1], alpha=0.5, color=mycolors[1], linewidth=2)
ax.fill_between(x, y1=y2, y2=0, label=columns[0], alpha=0.5, color=mycolors[0], linewidth=2)

# Decorations
ax.set_title('Personal Savings Rate vs Median Duration of Unemployment', fontsize=18)
ax.set(ylim=[0, 30])
ax.legend(loc='best', fontsize=12)
plt.xticks(x[::50], fontsize=10, horizontalalignment='center')
plt.yticks(np.arange(2.5, 30.0, 2.5), fontsize=10)
plt.xlim(-10, x[-1])

# Draw Tick lines  
for y in np.arange(2.5, 30.0, 2.5):    
    plt.hlines(y, xmin=0, xmax=len(x), colors='black', alpha=0.3, linestyles='--', lw=0.5)

# Lighten borders
plt.gca().spines['top'].set_alpha(0)
plt.gca().spines['bottom'].set_alpha(.3)
plt.gca().spines['right'].set_alpha(0)
plt.gca().spines['left'].set_alpha(.3)
plt.show()

圖44

45 日歷熱力圖（Calendar Heat Map）

與時(shí)間序列相比，日歷地圖是可視化基于時(shí)間的數(shù)據(jù)的備選和不太優(yōu)選的選項(xiàng)。雖然可以在視覺上吸引人，但數(shù)值并不十分明顯。然而，它可以很好地描繪極端值和假日效果。

（『Python數(shù)據(jù)之道』注：需要安裝 calmap 庫(kù)）

import matplotlib as mpl
# pip install calmap
# Python數(shù)據(jù)之道備注
import calmap
# Import Data
df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/yahoo.csv', parse_dates=['date'])
df.set_index('date', inplace=True)
# Plot
plt.figure(figsize=(16,10), dpi= 80)
calmap.calendarplot(df['2014']['VIX.Close'], fig_kws={'figsize': (16,10)}, yearlabel_kws={'color':'black', 'fontsize':14}, subplot_kws={'title':'Yahoo Stock Prices'})
plt.show()

圖45

46 季節(jié)圖（Seasonal Plot）

季節(jié)圖可用于比較上一季中同一天（年/月/周等）的時(shí)間序列。

圖46

七、分組（Groups）

47 樹狀圖（Dendrogram）

樹形圖基于給定的距離度量將相似的點(diǎn)組合在一起，并基于點(diǎn)的相似性將它們組織在樹狀鏈接中。

import scipy.cluster.hierarchy as shc

# Import Data
df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/USArrests.csv')

# Plot
plt.figure(figsize=(16, 10), dpi= 80)  
plt.title('USArrests Dendograms', fontsize=22)  
dend = shc.dendrogram(shc.linkage(df[['Murder', 'Assault', 'UrbanPop', 'Rape']], method='ward'), labels=df.State.values, color_threshold=100)  
plt.xticks(fontsize=12)
plt.show()

圖47

48 簇狀圖（Cluster Plot）

簇狀圖（Cluster Plot）可用于劃分屬于同一群集的點(diǎn)。下面是根據(jù)USArrests數(shù)據(jù)集將美國(guó)各州分為5組的代表性示例。此圖使用“謀殺”和“攻擊”列作為X和Y軸。或者，您可以將第一個(gè)到主要組件用作X軸和Y軸。

from sklearn.cluster import AgglomerativeClustering
from scipy.spatial import ConvexHull
# Import Data
df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/USArrests.csv')
# Agglomerative Clustering
cluster = AgglomerativeClustering(n_clusters=5, affinity='euclidean', linkage='ward')
cluster.fit_predict(df[['Murder', 'Assault', 'UrbanPop', 'Rape']])
# Plot
plt.figure(figsize=(14, 10), dpi= 80)
plt.scatter(df.iloc[:,0], df.iloc[:,1], c=cluster.labels_, cmap='tab10')
# Encircle
def encircle(x,y, ax=None, **kw):
if not ax: ax=plt.gca()
p = np.c_[x,y]
hull = ConvexHull(p)
poly = plt.Polygon(p[hull.vertices,:], **kw)
ax.add_patch(poly)
# Draw polygon surrounding vertices
encircle(df.loc[cluster.labels_ == 0, 'Murder'], df.loc[cluster.labels_ == 0, 'Assault'], ec='k', fc='gold', alpha=0.2, linewidth=0)
encircle(df.loc[cluster.labels_ == 1, 'Murder'], df.loc[cluster.labels_ == 1, 'Assault'], ec='k', fc='tab:blue', alpha=0.2, linewidth=0)
encircle(df.loc[cluster.labels_ == 2, 'Murder'], df.loc[cluster.labels_ == 2, 'Assault'], ec='k', fc='tab:red', alpha=0.2, linewidth=0)
encircle(df.loc[cluster.labels_ == 3, 'Murder'], df.loc[cluster.labels_ == 3, 'Assault'], ec='k', fc='tab:green', alpha=0.2, linewidth=0)
encircle(df.loc[cluster.labels_ == 4, 'Murder'], df.loc[cluster.labels_ == 4, 'Assault'], ec='k', fc='tab:orange', alpha=0.2, linewidth=0)
# Decorations
plt.xlabel('Murder'); plt.xticks(fontsize=12)
plt.ylabel('Assault'); plt.yticks(fontsize=12)
plt.title('Agglomerative Clustering of USArrests (5 Groups)', fontsize=22)
plt.show()

圖48

49 安德魯斯曲線（Andrews Curve）

安德魯斯曲線有助于可視化是否存在基于給定分組的數(shù)字特征的固有分組。如果要素（數(shù)據(jù)集中的列）無(wú)法區(qū)分組（cyl），那么這些線將不會(huì)很好地隔離，如下所示。

from pandas.plotting import andrews_curves

# Import
df = pd.read_csv('https://github.com/selva86/datasets/raw/master/mtcars.csv')
df.drop(['cars', 'carname'], axis=1, inplace=True)

# Plot
plt.figure(figsize=(12,9), dpi= 80)
andrews_curves(df, 'cyl', colormap='Set1')

# Lighten borders
plt.gca().spines['top'].set_alpha(0)
plt.gca().spines['bottom'].set_alpha(.3)
plt.gca().spines['right'].set_alpha(0)
plt.gca().spines['left'].set_alpha(.3)

plt.title('Andrews Curves of mtcars', fontsize=22)
plt.xlim(-3,3)
plt.grid(alpha=0.3)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.show()

圖49

50 平行坐標(biāo) （Parallel Coordinates）

平行坐標(biāo)有助于可視化特征是否有助于有效地隔離組。如果實(shí)現(xiàn)隔離，則該特征可能在預(yù)測(cè)該組時(shí)非常有用。

from pandas.plotting import parallel_coordinates
# Import Data
df_final = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/diamonds_filter.csv')
# Plot
plt.figure(figsize=(12,9), dpi= 80)
parallel_coordinates(df_final, 'cut', colormap='Dark2')
# Lighten borders
plt.gca().spines['top'].set_alpha(0)
plt.gca().spines['bottom'].set_alpha(.3)
plt.gca().spines['right'].set_alpha(0)
plt.gca().spines['left'].set_alpha(.3)
plt.title('Parallel Coordinated of Diamonds', fontsize=22)
plt.grid(alpha=0.3)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.show()

圖50

本站僅提供存儲(chǔ)服務(wù)，所有內(nèi)容均由用戶發(fā)布，如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請(qǐng)點(diǎn)擊舉報(bào)。

九色国产,午夜在线视频,新黄色网址,九九色综合,天天做夜夜做久久做狠狠,天天躁夜夜躁狠狠躁2021a,久久不卡一区二区三区

23 直方密度線圖 （Density Curves with Histogram）

24 Joy Plot

25 分布式包點(diǎn)圖 （Distributed Dot Plot）

26 箱形圖 （Box Plot）

27 包點(diǎn)+箱形圖 （Dot + Box Plot）

28 小提琴圖 （Violin Plot）

29 人口金字塔 （Population Pyramid）

30 分類圖 （Categorical Plots）

五、組成 （Composition）

31 華夫餅圖 （Waffle Chart）

32 餅圖 （Pie Chart）

33 樹形圖 （Treemap）

34 條形圖 （Bar Chart）

六、變化 （Change）

35 時(shí)間序列圖 （Time Series Plot）

36 帶波峰波谷標(biāo)記的時(shí)序圖 （Time Series with Peaks and Troughs Annotated）

37 自相關(guān)和部分自相關(guān)圖 （Autocorrelation (ACF) and Partial Autocorrelation (PACF) Plot）

38 交叉相關(guān)圖 （Cross Correlation plot）

39 時(shí)間序列分解圖 （Time Series Decomposition Plot）

40 多個(gè)時(shí)間序列 （Multiple Time Series）

41 使用輔助 Y 軸來繪制不同范圍的圖形 （Plotting with different scales using secondary Y axis）

42 帶有誤差帶的時(shí)間序列 （Time Series with Error Bands）

43 堆積面積圖 （Stacked Area Chart）

44 未堆積的面積圖 （Area Chart UnStacked）

45 日歷熱力圖 （Calendar Heat Map）

46 季節(jié)圖 （Seasonal Plot）

七、分組 （Groups）

47 樹狀圖 （Dendrogram）

48 簇狀圖 （Cluster Plot）

49 安德魯斯曲線 （Andrews Curve）