在python中pandas工具包具有基本的數(shù)據(jù)管理功能,自然就有一些查詢pandas中數(shù)據(jù)結(jié)構(gòu)dataframe的常用函數(shù)。下面就看看,與常用的SQL語(yǔ)句對(duì)應(yīng)的pandas函數(shù)有哪些?
數(shù)據(jù)從SQL Server數(shù)據(jù)庫(kù)中獲取,使用符合SQL語(yǔ)法規(guī)則的語(yǔ)句查詢并獲取。在python中如何連接數(shù)據(jù)庫(kù)的方法,請(qǐng)點(diǎn)擊閱讀 這篇文章 。
import pyodbcimport pandas as pdimport numpy as npconnection_string = ('Driver={SQL Server Native Client 11.0};' 'Server=Your_Server_Name;' 'Database=My_Database_Name;' 'UID=Your_User_ID;' 'PWD=Your_Password;')connection = pyodbc.connect(connection_string)
population = pd.read_sql('SELECT TOP(10) * FROM State_Population', connection)
或者寫成:
query = 'SELECT * FROM STATE_AREAS WHERE [area (sq. mi)] > 100000'area = pd.read_sql(query, connection)
以上方法中,pd.read_sql 函數(shù)的功能與 SQL語(yǔ)句'Select * from ....'相對(duì)應(yīng),并且以dataframe格式查詢輸出的結(jié)果area, 由此即可在python用pandas函數(shù)對(duì)area進(jìn)行更多的操作。
SQL查詢語(yǔ)句一般包括以下組成部分:
各組成部分在pandas中都有對(duì)應(yīng)的函數(shù)。從My_Database_Name 數(shù)據(jù)庫(kù)中的三個(gè)數(shù)據(jù)表,讀取其中兩個(gè)數(shù)據(jù)表(State_Population和STATE_AREAS)的部分內(nèi)容,都是dataframe格式,命名為:population 和 area。下文中將使用population,演示pandas中與SQL查詢有相同功能的函數(shù)。先看看population包括的數(shù)據(jù)內(nèi)容:
population.head()
輸出結(jié)果如下圖:
population
結(jié)果如下圖:
pd.DataFrame(population.year)
輸出結(jié)果如下圖:
population[['population', 'year']]
輸出結(jié)果:
該方法一般格式為:df.loc([row names], [column names]);獲取給定列的所有行數(shù)據(jù),可以用“:”替代 [row names] ,如:df.loc(: , [column names])。
population.loc[:,['population', 'year']]
輸出結(jié)果與方法(3)相同:
(二)在python中與SELECT * FROM State_Population WHERE year = 2010;語(yǔ)句相對(duì)應(yīng)的方法(5--8)。
population[population.year == 2010]
輸出結(jié)果如下圖:
population.loc[population.year == 2010,:]
輸出結(jié)果與(5)相同,略。
population.query('year == 2010')
輸出結(jié)果與(5)相同,略。
population[population.apply(lambda x: x['year'] == 2010, axis=1)]
輸出結(jié)果與(5)相同,略。
(三)在python中,與SELECT state/region, population, year FROM State_Population WHERE year = 2010 or 2012 and ages = under18; 相對(duì)應(yīng)的方法(9--12)。
population[(population.year.isin([2010, 2012])) & (population.ages == 'under18')][['state/region', 'population', 'year']]
輸出結(jié)果如下圖:
population.loc[(population.year.isin([2010, 2012])) & (population.ages == 'under18'),['state/region', 'population', 'year']]
輸出結(jié)果與(9)相同,略。
population.query('(year==2010 | year==2012) & ages == 'under18'')[['state/region', 'population', 'year']]
輸出結(jié)果與(9)相同,略。
population[population.apply(lambda x: (x['year'] in [2010, 2012]) & (x['ages'] == 'under18'), axis=1)]
輸出結(jié)果與(9)相同,略。
(四)在python中,與SELECT * FROM State_Population WHERE ages = total GROUP BY state/region HAVING AVG(population) > 10000000; 語(yǔ)句相對(duì)應(yīng)的方法。
population.groupby(by = 'state/region')
輸出結(jié)果如下圖:
population.groupby(by = ['state/region', 'year'])
輸出結(jié)果如下圖:
population.groupby(by = ['state/region', 'year']).count()
輸出結(jié)果如下圖:
還可以把分組結(jié)果賦給一個(gè)變量:grouped
grouped = population.groupby(by = ['state/region', 'year'])
df = pd.DataFrame(population.loc[population.ages == 'total', :].groupby(by = 'state/region').aggregate('population').mean())df.loc[df.population > 10000000, :]
輸出結(jié)果如下圖:
python中與SELECT * FROM State_Population WHERE ages = total GROUP BY state/region HAVING AVG(population) > 10000000 ORDER BY population; 對(duì)應(yīng)的方法
df = pd.DataFrame(population.loc[population.ages == 'total', :].groupby(by = 'state/region').aggregate('population').mean())df.loc[df.population > 10000000, :].sort_values(by = 'population')
輸出結(jié)果如下圖:
使用 “ascending = False” 參數(shù)的結(jié)果:
grouped = population.groupby(by = ['state/region', 'year']).mean()grouped
輸出結(jié)果如下圖:
對(duì)分類后的數(shù)據(jù)表進(jìn)行排序,
grouped.sort_values(by = ['year', 'population'], ascending=[True, False])
聯(lián)系客服