Financial data analysis II python preview -- crawling a province's university rankings with bs4

August 1, 2022 · 1 Minutes to read

For Freedom

Case (I) python warm-up

Project 3: Crawling the university ranking of a province

Enter the name of a province and crawl the data from "SoftTech China's Best Universities Ranking 2020" (http://www.zuihaodaxue.cn/zuihaodaxuepaiming2020.html) developed by Shanghai Jiao Tong University to output the province's 2020 University rankings for that province in 2020. Input: Guangdong Output. 在这里插入图片描述

import requests
from bs4 import BeautifulSoup
import bs4


def getHTMLText(url):
    try:
        r = requests.get(url, timeout=30)
        r.raise_for_status()
        r.encoding = r.apparent_encoding
        return r.text
    except:
        return ""


def fillUnivList(ulist, html):
    soup = BeautifulSoup(html, "html.parser")
    for tr in soup.find('tbody').children:
        if isinstance(tr, bs4.element.Tag):
            tds = tr('td')
            ulist.append([tds[0].string, tds[1].string, tds[2].string])


def printUnivList(ulist, num, place):
    tplt = "{0:^10}\t{1:{3}^10}\t{2:^10}"
    print("{:^10}\t{:^6}\t{:^10}".format("排名", "学校名称", "省市"))
    for i in range(num):
        u = ulist[i]
        if u[2] == place:
            print(tplt.format(u[0], u[1], u[2], chr(12288)))
#            print("{:^10}\t{:^6}\t{:^10}".format(u[0], u[1], u[2]))
        else:
            continue

def main():
    uinfo = []
    url = 'http://www.zuihaodaxue.cn/zuihaodaxuepaiming2019.html'
    html = getHTMLText(url)
    fillUnivList(uinfo, html)
    printUnivList(uinfo, 549, "广东")  # 20 univs


main()

Results of the run. 在这里插入图片描述

Financial data analysis I python preview -- dictionary sorting, calculating annual growth rate

August 1, 2022 · 6 Minutes to read

Allen Ma

For Freedom

I recently joined a factory for an internship in an internet finance related position and found a few rhymes with what I had learned. So I decided to take the time to list a few relevant cases to learn and learn.

Case (I) python warm-up

Most of the data acquisition and processing in the project uses the python programming language, so let's first review some common functions and writing rules.

Project one: dictionary statistics sorting

The dictionary d stores the correspondence between the 42 double-class universities in China and the provinces where they are located. Please use this list as a data variable to improve the Python code and count the number of schools in each province.‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬

The output shows the provinces with the highest numbers and the quantities.‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬

d = {"北京大学":"北京", "中国人民大学":"北京","清华大学":"北京",\ "北京航空航天大学":"北京","北京理工大学":"北京","中国农业大学":"北京",\ "北京师范大学":"北京","中央民族大学":"北京","南开大学":"天津",\ "天津大学":"天津","大连理工大学":"辽宁","吉林大学":"吉林",\ "哈尔滨工业大学":"黑龙江","复旦大学":"上海", "同济大学":"上海",\ "上海交通大学":"上海","华东师范大学":"上海", "南京大学":"江苏",\ "东南大学":"江苏","浙江大学":"浙江","中国科学技术大学":"安徽",\ "厦门大学":"福建","山东大学":"山东", "中国海洋大学":"山东",\ "武汉大学":"湖北","华中科技大学":"湖北", "中南大学":"湖南",\ "中山大学":"广东","华南理工大学":"广东", "四川大学":"四川",\ "电子科技大学":"四川","重庆大学":"重庆","西安交通大学":"陕西",\ "西北工业大学":"陕西","兰州大学":"甘肃", "国防科技大学":"湖南",\ "东北大学":"辽宁","郑州大学":"河南", "湖南大学":"湖南", "云南大学":"云南", \ "西北农林科技大学":"陕西", "新疆大学":"新疆"}

输出格式 == Province: Number (Chinese colon)==

d = {"北京大学":"北京", "中国人民大学":"北京", "清华大学":"北京",
     "北京航空航天大学":"北京", "北京理工大学":"北京", "中国农业大学":"北京",
     "北京师范大学":"北京", "中央民族大学":"北京", "南开大学":"天津",
     "天津大学":"天津", "大连理工大学":"辽宁", "吉林大学":"吉林",
     "哈尔滨工业大学":"黑龙江", "复旦大学":"上海", "同济大学":"上海",
     "上海交通大学":"上海","华东师范大学":"上海", "南京大学":"江苏",
     "东南大学":"江苏", "浙江大学":"浙江", "中国科学技术大学":"安徽",
     "厦门大学":"福建", "山东大学":"山东", "中国海洋大学":"山东",
     "武汉大学":"湖北", "华中科技大学":"湖北", "中南大学":"湖南",
     "中山大学":"广东", "华南理工大学":"广东", "四川大学":"四川",
     "电子科技大学":"四川", "重庆大学":"重庆", "西安交通大学":"陕西",
     "西北工业大学":"陕西", "兰州大学":"甘肃", "国防科技大学":"湖南",
     "东北大学":"辽宁","郑州大学":"河南", "湖南大学":"湖南", "云南大学":"云南",
     "西北农林科技大学":"陕西", "新疆大学":"新疆"}
counts = {}
for place in d.values():
    counts[place] = counts.get(place, 0) + 1
items = list(counts.items())
items.sort(key=lambda x: x[1], reverse=True)
print(items[0][0]+'：'+str(items[0][1]))

Results of the run. pg1

Project 2: Calculating the annual growth rate of mobile phone sales

The file smartphone.txt holds annual sales data of mobile phones for certain companies, with each row containing a number of annual sales (in millions) for each company, with tabs as separators between the data.
To open the file, please specify the file encoding format: with open("smartPhone.txt",encoding="gbk") as f:

smartPhone.txt文件内容如下：

公司  2014年   2015年   2016年   2017年
Samsung 311 322.9   310.3   318.7
Apple   192.9   231.6   215.2   15.8
Huawei  73.6    104.8   139.1   153.1
OPPO    29.9    50.1    92.9    121.1
Vivo    19.5    40.5    74.3    100.7
ZTE 43.8    56.2    60.1    44.9
LG  59.2    59.7    55.1    55.9
Lenovo  70.1    74.1    50.7    49.7
Xiaomi  61.1    70.7    61.5    96.1

write function isBigGrowth(L, rate), the form of reference L for a set of numerical data containing a list (a company's sales in various years), rate for the annual growth rate, to determine and return whether the annual sales growth: if the annual sales growth rate exceeds the given rate, it is True, otherwise False.
The main program reads the data in smartphone.txt, converts each line of data into numerical data, and uses the function isBigGrowth(L,rate) to calculate and screen output whether the annual sales of each company are growing rapidly (this question sets the annual sales growth rate of more than 30% for rapid growth), with tabs separating the data.
The results of the program are shown below.

Mobile phone company	Is there rapid growth?
Samsung	No
Apple	No
Huawei	No
OPPO	Fast
Vivo	Fast
ZTE	No
LG	No
Lenovo	No
Xiaomi	no

import os

def isBigGrowth(L,rate):
    if float (L[2])>float(L[1])*(1+rate) and float(L[3])>float(L[2])*(1+rate) and float(L[4])>float(L[3])*(1+rate):
        return "快速"
    else:
        return "否"
        
with open(r"文件路径\smartPhone.txt") as f:
    line = f.read().strip()
    linestr = line.split("\n")  # 以换行符分隔
    del linestr[0]
print("手机公司    是否快速增长?")
for s in linestr:
    try:
        L = s.split('\t')
        print(L[0], end="    ")
        print(isBigGrowth(L, 0.3))
    except:
        print('运行失败')

Results of the run. 在这里插入图片描述

Tushare Financial Data Interface V Case Study - Stock pool creation for quality fundamentals

August 1, 2022 · 6 Minutes to read

Allen Ma

For Freedom

# Quality fundamentals for stock pool creation

Fundamental data of listed companies is an important evidence reflecting the historical performance of the company's operation and an important basis for investors to judge the future development prospect of the company. Financial analysts and stock investors need to analyse the quality of the company's fundamentals to assess the investment value of the company's stocks. The fundamental data of listed companies obtained from the Tushare platform mainly includes regularly published reports on the company's operating results, profitability, operating capacity, growth capacity, solvency and cash flow, which reflect the company's operating conditions at different levels respectively.

With nearly 3,600 normally traded stocks in China's stock market (Shanghai Stock Exchange and Shenzhen Stock Exchange), it is impossible for the average investor to have enough time and energy to analyse the fundamental data of all listed companies when choosing the ideal investment target. Therefore, it is necessary for investors to automate the screening of stocks in the market using some key indicators with empirical guidance, i.e. relying on a computer programme to screen out stocks with high quality fundamentals from the full range of listed company stocks, thereby significantly improving the efficiency of investors in finding stocks of high quality companies. For example, a company's profitability, growth and cash flow are some of the indicators used to identify quality stocks as candidates for investment.

The basic steps for screening quality stocks are the following 4 steps.

① Use the fundamental data interface function built into the Tushare package to obtain the fundamental data of all stocks.

② Determine the key indicator items reflecting the quality of fundamentals based on experience and extract the data series corresponding to the key indicator items from the fundamentals data of all stocks.

③ Use Pandas built-in function to merge multiple fundamental data series into one DataFrame data.

④ Determine the sorting parameters of the data series according to the importance of the key indicator items and sort the merged data. From the ranking results, the top rows of each indicator are selected, and the stocks corresponding to these rows are the set of stocks with relatively high fundamental quality (i.e. the quality stock pool).

The table below shows the interface functions for obtaining company profitability, growth and cash flow fundamentals and the information on the parameters returned. 在这里插入图片描述 From the data items returned by the functions listed in the table above, the interface function returns a very rich variety of data that can be used to reflect the value of the company's investment in a number of combinations of data items. For example, the return parameters from the three types of data listed in the table above select the net interest rate, return on net assets, earnings per share growth rate, net profit growth rate, earnings per share growth rate, the ratio of net operating cash flow to net profit and cash flow ratio as key evaluation indicators, and the comprehensive use of the value of these seven indicators to reflect the quality of the company's fundamentals. Generally, the higher the value of these indicators, the higher the fundamental quality of the company and the greater the investment value of the company's shares. The following program demonstrates a method of using financial indicator data to screen for quality company stocks.

import tushare as ts
import pandas as pd
import datetime

# Get the latest financial statement data. The financial report disclosure time for A-share listed companies in China stipulates that the first quarterly report should be disclosed by April 30, the
# disclose the half-yearly report by August 31, the third quarterly report by October 30, and the annual report by April 30 of the following year.
this_year = datetime.datetime.today().year
this_month = datetime.datetime.today().month
if this_month >= 11: # This year's third quarterly report has been published
    fin_year = this_year
    fin_sea = 3
elif this_month >= 5: # The previous year's annual report is usually published at the end of April, although the first quarter of the year is also optional
    fin_year = this_year-1
    fin_sea = 4
else:
    fin_year = this_year-1
    fin_sea = 3
print("%s year %s quarter" %(fin_year,fin_sea))

printout: 4 quarters of 2019

df1 = ts.get_profit_data(fin_year, fin_sea)
df2 = ts.get_growth_data(fin_year, fin_sea)
df3 = ts.get_cashflow_data(fin_year, fin_sea)

在这里插入图片描述如果在If you have saved the financial data file before running the program, you can read the financial data directly from the local data file, thus avoiding the need to download the same data every time you run the program.

#code, code; name, name; net_profit_ratio, net profit margin (%); roe, return on net assets (%); eps, earnings per share.
#nprg, net profit growth rate (%); nav, net asset growth rate.
df_merge = pd.merge(df1[['code','name', 'net_profit_ratio', 'roe', 'eps']],
                    df2[['code', 'nprg', 'nav']], on='code', how='left')
#left outer join, left table unrestricted, keep data from left table, match right table, columns in rows not matched by right table are shown as NaN
#cf_nm, ratio of net operating cash flow to net profit; cashflowratio, cash flow ratio.
df_merge = pd.merge(df_merge, df3[['code', 'cf_nm', 'cashflowratio']],
                    on='code', how='left').dropna() # Delete rows containing NaN

focus_df = df_merge.sort_values(['nprg', 'net_profit_ratio', 'cf_nm', 'nav',
                                 'roe', 'eps', 'cashflowratio'], ascending=False)#nprg is the first keyword

Regarding the order of the key columns for sorting the consolidated statement, the interested reader can make more order adjustments, compare the set of stocks and their sorting in the final retained data table select_df, examine how the results of the sorting operation differ for different indicator orders, and find the corresponding stocks to understand the price changes of the stocks over the last 3 years.

focus_df['code']='\t'+ focus_df['code']# ensure that the code is entered into the csv file in the form of characters, \t is a tab
if focus_df.iloc[:, 0].size > 100:
    select_df = focus_df[['code', 'name', 'nprg', 'net_profit_ratio', 'cf_nm', 'nav',
                          'roe', 'eps', 'cashflowratio']].head(100)
else:
    select_df = focus_df[['code', 'name', 'nprg', 'net_profit_ratio', 'cf_nm', 'nav',
                          'roe', 'eps', 'cashflowratio']]
select_df.to_csv('focus'+str(fin_year)+str(fin_sea)+'.csv',encoding='cp936',index=False)

在这里插入图片描述 The disclosure time for financial reports of A-share listed companies in China is stipulated as follows: the first quarterly report is disclosed by 30 April, the half-yearly report by 31 August, the third quarterly report by 30 October, and the annual report is disclosed by 30 April of the following year. The program uses the date function of the datatime package to determine the available financial report data.

From the reality of the stock market, there is no uniform standard for selecting key indicators of listed company stocks with quality fundamentals, partly because there are differences when different financial indicators reflect a company's focus, and partly because the numerical comparability of financial data indicators of companies in different industries is inconclusive. This example only provides a method to improve the efficiency of stock screening of quality companies using a Python program, and does not provide an investment basis for screening quality stocks.

Tushare Financial Data Interface IV Case Study - Plotting stock k-line charts

August 1, 2022 · 5 Minutes to read

Allen Ma

For Freedom

Tushare financial data interface

Plotting stock k-line charts

The structure and changing characteristics of different types of financial data vary, and naturally the form of charts suitable for describing the characteristics of different types of data will also vary. Line and point charts are the most common two-dimensional charts used by financial analysts, as they are easier to show the changing characteristics of financial data and are simpler to draw. This section introduces the basic methods of visualising time series data in the Python language using the drawing of stock k-line charts (also known as candlestick charts) as an example.

The process of plotting stock k-line charts using Tushare platform data in the Python runtime environment consists of three main steps as follows.

Step 1: Determine the data source. In this section, we choose to obtain stock ticker data from the Tushare platform and use the built-in interface functions of the Tushare package to obtain stock ticker data, so we first import the Tushare package using the command import tushare.

Step 2: Determine the form of the visualisation and the tools to implement it. In this section, the financials charting package mpl_finance is chosen as the tool for drawing k-line charts of stock prices, so you need to use the command "from mpl_finance import candlestick_ochl" to import the sub-module for drawing k-line charts from the mpl_finance package module candlestick_ochl. The mpl_finance package is a separate graphics package from Matplotlib (the command "pip install mpl_finance" completes the installation of the package) and is usually used for plotting stock price k-line charts and line charts. As the pro interface functions provided by the Tushare package differ from the data structure returned by the normal interface functions, it is necessary to ensure that the data format matches when calling the plotting functions. The program in this section makes use of the pro interface's pro.daily() function to obtain daily stock data and adapt the tick data structure appropriately to the requirements of the parameter format of the candlestick_ochl() function of the mpl_finance package. In addition, the number of k-lines drawn in the output chart should not be too high, as too many k-lines will inevitably lead to too small a distance between the k-lines and make the k-chart less clear.

Step 3: Determine the output tool for drawing the chart. This section selects the charting package Matplotlib as the output tool for the k-line chart, because Matplotlib package provides a wealth of chart output functions, which can be set relatively easily for the structure of the chart layout, colour and axis format and many other aspects, making the chart more beautiful and easier to understand. Therefore the module Matplotlib needs to be imported into the program with the command "import matplotlib".

import tushare as ts
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import ticker
from matplotlib.pylab import date2num
#from mpl_finance import candlestick_ochl #need to install the mpl_finance package separately
from mplfinance.original_flavor import candlestick_ochl #need to install the mpl_finance package separately
plt.rcParams['font.sans-serif'] = ['SimHei'] # Used to display Chinese labels normally

# Users with sufficient permissions are using the pro interface to get data
pro = ts.pro_api()
code = '600004.SH'
df = pro.daily(ts_code=code, start_date='20191201')
df.shape
#stock_daily = pro.daily(ts_code=code, start_date='20181201')
# stock_daily.to_excel('stock_daily.xlsx') # save as spreadsheet

# Users who do not have sufficient permissions to use the pro interface to get the data execute the following code to get the data directly from the xlxs file
df = pd.read_excel('stock_daily.xlsx', dtype={'code': 'str','trade_date': 'str'}) 
df.drop(df.columns[0], axis=1, inplace=True)
df.shape

df2 = df.query('trade_date >= "20171001"').reset_index() # select data after Oct 1, 2017
df2 = df2.sort_values(by='trade_date', ascending=True) # sort the original data in descending order by date
df2['dates'] = np.range(0, len(df2)) # len(df2) refers to the number of records
fig, ax = plt.subplots(figsize=(20, 9))
fig.subplots_adjust(bottom=0.2) # control subplots
### arguments to the candlestick_ochl() function
# ax Examples of plotting Axes
# quotes sequence (time, open, close, high, low) time is of type float, date must be converted to float
# width The width of the red and green rectangle in the image, representing the number of days
# colourup the colour of the closing price if it is greater than the opening price
# colordown the colour of the rectangle if it is lower than the opening price
# alpha the transparency of the colour of the rectangle
candlestick_ochl(ax, quotes=df2[['dates', 'open', 'close', 'high', 'low']].values,
                 width=0.55, colorup='r', colordown='g', alpha=0.95)
date_tickers = df2['trade_date'].values  
def format_date(x, pos):
    if (x < 0) or (x > len(date_tickers)-1):
        return ''
    return date_tickers[int(x)]
ax.xaxis.set_major_formatter(ticker.FuncFormatter(format_date)) # select and display the time scale on the horizontal axis according to certain rules
plt.xticks(rotation=30) # set the angle of rotation of the date scale
ax.set_ylabel('transaction_price')
plt.title(code)
plt.grid(True) # add grid, optional, just makes the image look better
plt.xlabel('trade date')
plt.show()

在这里插入图片描述

Tushare Financial Data Interface III Case - Stock Fundamental Statistics

August 1, 2022 · 5 Minutes to read

Allen Ma

For Freedom

Tushare Financial Data Interface

# Stock Fundamental Statistics

Use the get_stock_basics() function to download all stock fundamental data at once. This is useful for looking at the overall market situation of a stock.

import  tushare  as  ts
import  pandas as pd
import numpy as np
import  matplotlib.pyplot  as  plt

stock = ts.get_stock_basics()     # Download Stock Fundamental Data
stock.to_excel('stock.xlsx')     # Save as spreadsheet
stock.shape                      # Out: (3678, 22)

在这里插入图片描述 The size of the dataset is 3823x22 and each row is the basic data for one stock. See Tushare's website for field details. See the http://tushare.org website for details of the data set fields. The data columns used in this section are: code, stock code; name, name; industry, industry; area, region; pe, price-to-earnings ratio; totals, total equity (RMB 100 million); esp, earnings per share; timeToMarket, date of listing.

The data is read from the spreadsheet file below, noting the details of the handling of the stock code column. pandas always tries to automatically convert the data to a numeric type when reading it. If a stock code like '002522' is read in for the Shenzhen market, the leading character '00' will be lost and it will become the integer 2522, so the code field is deliberately specified as a string when reading.

df = pd.read_excel('stock.xlsx', dtype={'code': 'str'})   # code string type
df.set_index('code', inplace=True)  # Set code as index column
df.loc['002522']                        # Showing the fundamentals of a stock

在这里插入图片描述

len(df.industry.unique())   # Show industry numbers

在这里插入图片描述

len(df.area.unique())  # Showing the number of regions (i.e. the provinces to which the shares belong)

在这里插入图片描述

# Number of listed companies by region, reflecting regional economic strength
df.groupby('area').size().sort_values(ascending=False)

在这里插入图片描述 As can be seen from the above statistics, the more economically developed and dynamic the region, the greater the number of listed companies. The reader can also perform similar statistics by industry. The timeToMarket field in the data box represents the date of listing and is an integer in the format of "20190315". We can extract the year from it to count the number of shares issued each year.

year = df.timeToMarket.astype('str').str[:4]  # Convert to a string and extract the first 4 digits of the year
yearnum = df.groupby(year).size()    # Statistics by year to obtain the number of shares issued per year
yearnum

在这里插入图片描述

plt.rcParams['font.sans-serif'] = ['SimHei'] # Specify Chinese bold font
# False below fixes a problem with the negative '-' sign on the axis being displayed as a square
plt.rcParams['axes.unicode_minus'] = False 
# There are a few stocks in the dataset that do not have a year of issue (year 0), exclude year 0 from the graph
yearnum[yearnum.index!='0'].plot(fontsize=14, title='年IPO数量')

在这里插入图片描述从图中It can be seen that several highs in the year of IPO issuance correspond to several bull market times in the domestic stock market, with the number of issuances falling to a low during bear markets. The following calculates the market's average price-to-earnings ratio, pe, which is an important parameter in measuring stock market valuation.

df.pe.mean()            # Simple arithmetic average pe

在这里插入图片描述 Looking at the dataset reveals that the pe of loss-making stocks in the dataset is 0. Therefore, the removal of loss-making stocks is considered.

df[df.pe > 0].pe.mean()     # Calculating pe averages after excluding loss-making stocks

在这里插入图片描述 The pe above is a simple arithmetic average, a weighted pe with market capitalisation as the weighting may be a more accurate reflection of market conditions. As the total market capitalisation and stock unit prices are not available in the downloaded dataset, the total market capitalisation can only be extrapolated from the available fields. It is also common in data processing to calculate new column values from the values of certain columns. Here the total market capitalisation is extrapolated on the basis of Unit price of stock = 4esp (earnings per share) pe (price-to-earnings ratio) Total market capitalisation = share unit price *totals total equity (RMB billion) The earnings per share esp in the dataset is for a single quarter, so multiply the full year earnings by 4.

df['tvalue'] = 4 * df.esp * df.pe * df.totals   # Calculate total market value, add new column tvalue
np.sum(df.pe * df.tvalue) / df.tvalue.sum()   # Calculation of weighted pe with market capitalisation as the weighting

在这里插入图片描述 The above calculation reflects the market-weighted pe after a particular quarterly report and the result differs from the true market value. This is because stocks have different returns each quarter and therefore you cannot simply calculate the full year return on a "4*single quarter return" basis.

China's stock market is now divided into Shanghai (stock code beginning with 60), Shenzhen Main Board (stock code beginning with 00), GEM (stock code beginning with 30) and the newly listed STB (stock code beginning with 68). The following codes can be used to calculate the pe value and the number of stocks in different sectors.

df['board'] = df.index.str[:2] # take first 2 characters of code, add new board column
# count pe averages by board type, count
df.groupby('board').pe.agg([('pe均值', 'mean'), ('股票数', 'count')])

在这里插入图片描述

Tushare Financial Data Interface I Installation

August 1, 2022 · 4 Minutes to read

Allen Ma

For Freedom

The main content of this article is excerpted from Chapters 8 and 10 of the textbook Fundamentals of python Programming, edited by Xueling Zhong and Li Li and published in December 2019 by Electronic Industry Press.

Tushare Financial Data Interface

Installation

The Tushare website is a free and suitable financial data platform for Python developers. The platform can provide financial data covering many categories of data such as China's macro economy, various indices of the domestic stock market, stock trading data of domestic listed companies, regular financial reports of listed companies and domestic financial news. Tushare official website: http://tushare.org

pip install tushare -i https://pypi.tuna.tsinghua.edu.cn/simple#Installation
pip install BeautifulSoup4 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install openpyxl -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install mplfinance -i https://pypi.tuna.tsinghua.edu.cn/simple

在这里插入图片描述 The data returned by Tushare's built-in functions are all of the DataFrame type of Pandas, so it is easier to manipulate the data returned by these functions using the manipulation tools provided by Pandas, NumPy, Matplotlib and other packages.

Platform common interface call testing

Tushare is divided into ordinary interface and pro user interface. The ordinary interface can be used directly without registration, for example, using the ts.get_gdp_year() function of the ordinary interface to obtain Gross Domestic Product (GDP) data.

import tushare as  ts
df=ts.get_gdp_year()
df.head()

在这里插入图片描述 Return Value Description:

Parameter	Explanation
year	year
gdp	Gross Domestic Product (billion yuan)
pc_gdp	Gross domestic product per capita ($)
gnp	Gross national product (billion yuan)
pi	Primary sector (billion yuan)
si	Secondary industry (billion yuan)
industry	Industry ($bn)
cons_industry	Construction (billions of yuan)
ti	Tertiary sector (billions of yuan)
trans_industry	Transportation, storage, post and telecommunications (billion yuan)
lbdy	Wholesale, retail trade and restaurants (billion yuan)

Registering the pro interface

Tushare's normal interface can be used directly without registration, but less data, the more advanced pro interface requires the user to register with the platform https://tushare.pro/register and set up user credential information in the runtime environment before before the pro interface can be used and more data can be downloaded. The process consists of 7 steps as follows.

(1) Login to the webpage https://tushare.pro/register and register as a Tushare community user.

(2) Login to the Tushare community at https://tushare.pro/login and then perform the following three steps to retrieve the user credentials: First, after logging in successfully, move your mouse to the user name in the top right corner of the page and click on the The user can then access the "User Centre" by clicking on the "Personal Home" option in the drop-down menu. The user will then click on the "Interface TOKEN" tab on the "User Centre" page as follows. Finally, click on the copy icon on the right hand side (circled in red) to copy the entire contents of the text box.

在这里插入图片描述 (3) Use the command pip install tushare to install the Tushare package locally.

(4) Execute the import tushare package command import tushare as ts in the IPython operator interface.

(5) Use the built-in function set_token() of the Tushare package to set the token credential information of the local user in the following way, where the credential information should be represented as a string. ts.set_token("user tushare token")

(6) Initialise the pro interface with the command pro = ts.pro_api(). If set_token('user tushare token') is not valid or you do not want to save the token locally, then you can set token: pro_api('user token') directly when initialising the interface.

(7) Data retrieval. After completing the first 6 operations, the user can only call the pro interface function to get the corresponding data.

pro interface call test

The operation of the pro interface function for obtaining daily stock quotes is as follows.

import tushare as  ts
pro = ts.pro_api('用户tushare token')
df = pro.daily(ts_code = '600104.SH', start_date = '20000501', end_date = '20200917')
df.head()

在这里插入图片描述

Tushare Financial Data Interfaces II Introduction to Common Data Interfaces

August 1, 2022 · 7 Minutes to read

Allen Ma

For Freedom

Macroeconomic data

A wide range of domestic macroeconomic data can be obtained from the Tushare platform using Tushare's built-in functions, such as money supply, reserve requirement ratio, deposit and loan rates, GDP, consumer price index and ex-factory industrial price index for multiple periods.

在这里插入图片描述

Money supply

The ts.get_money_supply() function allows you to obtain data on our money supply for each of the last 30 years.

import tushare as  ts
df = ts.get_money_supply()
df.head()
df.columns

在这里插入图片描述 Return value description.

Parameter	Explanation
month	Statistics time
m2	Monetary and quasi-monetary (broad money M2) (billion yuan)
m2_yoy	Monetary and quasi-monetary (broad money M2) year-on-year growth (%)
m1	Money (Narrow Money M1) (billions of yuan)
m1_yoy	Money (narrow money M1) year-on-year growth (%)
m0	Cash in circulation (M0) ($bn)
m0_yoy	Cash in circulation (M0) year-on-year growth (%)
cd	Demand deposits ($bn)
cd_yoy	Demand deposits growth (%)
qm	Quasi-currency ($bn)
qm_yoy	Quasi-currency growth (%)
ftd	Time deposits ($bn)
ftd_yoy	Year-on-year growth in time deposits (%)
sd	Savings deposits ($bn)
sd_yoy	Savings deposits growth (%)
rests	Other deposits ($bn)
rests_yoy	Year-on-year growth in other deposits (%)

Deposit rates and loan rates

Tushare provides corresponding interface functions for different types of interest rates in order for developers to obtain the required interest rate data. The deposit rate function get_deposit_rate() and the loan rate function get_loan_rate() can be used to obtain the deposit rate and loan rate data issued by the People's Bank of China since 1989 respectively.

import tushare as  ts
df = ts. get_loan_rate()
df.head()

在这里插入图片描述

Shibor Rate

The Shanghai Interbank Offered Rate (Shibor) is the arithmetic average of the RMB interbank offered rates calculated and determined by a quotation group of banks with high credit ratings, using the National Interbank Offered Rate Centre in Shanghai as the technical platform to calculate, publish and name the rate.

import tushare as  ts
pro = ts.pro_api('用户tushare token')
df=pro.shibor()
df.head()

在这里插入图片描述 Other interest rate pro interface functions. (1) pro.shibor_quote():Shibor quoted data (2) pro.shibor_lpr():LPR lending base rate (3) pro.libor():LIBOR (4) pro.Hibor():Hong Kong Interbank Offered Rate

Stock Quote Data

From the Tushare platform, users can access the stock trading data of all listed companies on the Shanghai Stock Exchange and Shenzhen Stock Exchange, as well as the data of various stock indices (e.g. SSE Composite Index, SZSE Component Index, GEM Index, CSI 300 Index and SMB Index, etc.) of these two markets. The following commands can be executed in sequence to obtain historical daily ticker data for stock code 600848.

import  tushare  as  ts
df = ts.get_hist_data('600848', ktype = 'D')
df.head()

在这里插入图片描述 Return value description.

Parameter	Explanation
date	Date
open	Opening price
high	Highest price
close	Closing price
low	Lowest price
volume	volume
price_change	price change
p_change	price_change
ma5	5-day average
ma10	10 Day Average
ma20	20 Day Average
v_ma5	5-day average volume
v_ma10	10 Day Average
v_ma20	20 Day Average
turnover	turnover

The Tushare package provides a number of pro interface functions to return historical stock data. Registered users of the platform can use the pro interface functions to obtain stock quotes, but most of the pro interface functions require users to have a certain number of points before they can be called. The operation of the pro interface function to obtain daily stock quotes is as follows.

import tushare as  ts
pro = ts.pro_api()
df = pro.daily(ts_code = '600008.SH', start_date = '20000501', end_date = '20190808')
df.head()

在这里插入图片描述 There are some structural differences between the quotation data obtained with the pro interface function pro.daily() and the quotation data obtained with the get_hist_data() function. The return data from the pro.daily() function uses the default index, with the date of the transaction as a field (column) data item, whereas the return data from the get_hist_data() function uses the date of the transaction as an index. Given that the structure of the data returned by the different interface functions differs to a greater or lesser extent, it is important to design programs that process this data in such a way that the appropriate processing method is chosen based on the structure of the data returned, or that the data items returned are modified as necessary to meet the formatting requirements of other data processing statements.

在这里插入图片描述 Tushare also provides real-time data on stock trading quotes, i.e. data on the prices of stocks that are being traded on that day. For example, the real time stock trading data obtained through the get_realtime_quotes() function can include information such as the list of quotes and transaction prices of the stock at the current moment, the five buy quotes and five sell prices and other data items, with a total of more than 30 items of information. The operation process is as follows.

import tushare as  ts
df = ts.get_realtime_quotes('300274')
df[['code','name','price','bid','ask','volume','amount','time']] 

在这里插入图片描述 Partial return value description.

Parameter	Explanation
code	Stock code
name	Name of the stock
price	Current price
high	Today's high price
low	Today's low price
bid	The bid price, i.e. "buy one" offer
ask	bid to sell, i.e. "sell one" offer
volume	volume maybe you need do volume/100 amount, amount traded ($ CNY)

Fundamental data of listed companies

The Tushare platform provides fundamental data on listed companies, including financial position, profitability, market share, management structure, talent composition and more. In addition to stock price data, financial analysts often need fundamental data on listed companies to understand the value of a company's investment. Tushare provides the interface functions for listed companies' fundamental data as shown in the table. 在这里插入图片描述

Stock Index Data

A stock index is a composite price value of stocks compiled by a stock exchange or financial services institution that reflects the price movements of a particular group (class) of stocks.

Stock exchanges and some financial services institutions have compiled and publicly released dozens of stock price indices, and stock investors are accustomed to using stock indices as an observational indicator of stock market price movements.

The pro.index_basic () function can be used to obtain the various stock indices published by the SSE.

import  tushare  as  ts
pro = ts.pro_api('用户tushare token')
df = pro.index_basic(market = 'SSE')
df.head()

在这里插入图片描述 The following two tables list the input and output parameters of the Tushare platform pro version interface function index_basic().

在这里插入图片描述

matplotlib plotting with time as horizontal coordinate, using dataframe data

August 1, 2022 · 2 Minutes to read

Allen Ma

For Freedom

Origins

When I was using a SQL database as a data source, I found that I was getting mostly time series data, and I really wanted to draw line graphs with time as the horizontal coordinate. I wanted the freedom to set the time interval, and the canvas size.

import matplotlib.pyplot as plt
import pandas as pd
import pymssql
import warnings
from pylab import *
import matplotlib.dates as mdates

warnings.filterwarnings('ignore')

connect = pymssql.connect('IP地址','用户名','密码','数据库名')
print("连接成功")
data = pd.read_sql("select TRADEDATE,cast(TCLOSE as int) from TQ_QT_SKDAILYPRICE where SECODE='2010000512'", con=connect)
#print(data.head()) #View the results of the reading
data.columns = ['day','close']
#print(data.head()) #View the results of the reading
data['day'] = pd.to_datetime(data['day'])    #Convert to date, otherwise the date settings below will not work

#The matplotlib.pyplot approach
plt.rcParams['font.family'] = ['sans-serif']
plt.rcParams['font.sans-serif'] = ['SimHei']

fig = plt.figure(figsize=(20, 5))
ax = fig.add_subplot(1, 1, 1)

plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))  #Set the format of the x-axis main scale display (date)
plt.gca().xaxis.set_major_locator(mdates.MonthLocator(interval=15))  #Set the x-axis main scale interval

plt.xlabel('日期')
plt.ylabel('收盘价')
plt.title('2010000512收盘价折线图')
plt.plot(data['day'],data['close'])
plt.show()

Results of the run. 在这里插入图片描述

Adjust the canvas size

fig = plt.figure(figsize=(20, 5))

20 and 5 are the length and width of the canvas respectively.

Control the normal display of Chinese characters and plus and minus signs

plt.rcParams['font.family'] = ['sans-serif']
plt.rcParams['font.sans-serif'] = ['SimHei']

Set the x-axis master scale display format (date)

import matplotlib.dates as mdates
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))  

'%Y-%m' can be freely selected from '%y-%m-%d %H:%M'. The year, month, day, hour and minute are displayed respectively.

Set x-axis main scale spacing

import matplotlib.dates as mdates
plt.gca().xaxis.set_major_locator(mdates.MonthLocator(interval=15))

Change the sparsity of the horizontal labels by changing the size of the interval assignment. The larger the interval, the more sparse it is.

IndexError list index out of range error principle and solution -python

August 1, 2022 · 3 Minutes to read

Allen Ma

For Freedom

Problems identified

When I was writing the article Financial Data Analysis (I) python previewThe following error was reported for Project 2: Calculating the annual growth rate of mobile phone sales. 在这里插入图片描述 It looks like the results have been run, but the long string in front of it is really uncomfortable to watch. This, coupled with the fact that ==Process finished with exit code 1== makes me even more convinced that there is something fishy going on here.

As shown in the diagram, the error message is

Traceback (most recent call last):
IndexError: list index out of range

Find out why

After searching for information on the subject I learned that there are two main reasons why list index out of range errors occur.

One possibility is that the subscript is out of range. one could be that the list is empty, without a single element

Let me next illustrate this abstraction with an example. Open idle and enter the following code

>>> li = [1,2,3,4,5,6,7,8,9,10]
>>> #Index[0,1,2,3,4,5,6,7,8,9 ]
>>> li[8]
9
>>> li[10]
Traceback (most recent call last):
  File "<pyshell#3>", line 1, in <module>
    li[10]
IndexError: list index out of range
>>> # Like this, the index value is out of the loop, also called out of bounds

Trying to solve

First attempt to solve. Add the try... .except module

for s in linestr:
    try:
        L = s.split('\t')
        print(L[0], end="    ")
        print(isBigGrowth(L, 0.3))
    except:
        print('运行失败')

Results of the run: 在这里插入图片描述 As you can see, the headache-inducing red error message is gone, and ==Process finished with exit code 0== The problem seems to have been solved, but we know that this is only superficial and that the problem has not been solved, but we have chosen not to display it. The cure is not the cause!

Second attempt to resolve. I noticed that the program was throwing an exception at the last entry. The mobile company is able to analyse and determine all of them, so could it be the second cause of the error? There was the empty list. So I checked the source documentation and found that there was indeed an empty line at the end. 在这里插入图片描述 After removing the blank line at the end, run the program again: Now that's perfect! Comfortable =。=

Summary

It is very important to check the format of the data being analysed before processing it!

Common drawing styles for matplotlib

August 1, 2022 · 1 Minutes to read

Allen Ma

For Freedom

Visual computing is an important part of data analysis, especially the module matplotlib.

Recently a new generation of plotting module plotly has come out of nowhere and has a definite advantage in terms of interactivity. However, it is still easier to plot some simple images using matplotlib.

In 2017 matplotlib 2.0 was released, offering six plotting styles for users to choose from.

'bmh','dark_background','fivethirtyeight','ggplot','grayscale','default'

Different integrated environments can increase the number of matplotlib plotting styles to over 20. These methods can be applied directly to pandas' plotting statements.

The image is plotted using the pandas data analysis module's built-in plot command, with the following code.

#coding=utf-8
'''
Created on 2020.09.29
'''
import matplotlib.pyplot as plt
import pandas as pd

def dr_xtyp(_dat):
    #xtyp=['bmh','dark_background','fivethirtyeight','ggplot','grayscale','default']
    for xss in plt.style.available:
        plt.style.use(xss);print(xss)
        _dat['Open'].plot()
        _dat['Close'].plot()
        _dat['High'].plot()
        _dat['Low'].plot()
        fss="tmp\\stk001_"+xss+"_pd.png";plt.savefig(fss)
        plt.show()
    
# =======================
df = pd.read_csv('dat\\appl2014.csv',index_col=0,parse_dates=[0],encoding='gbk') 

d30=df[:30]
dr_xtyp(d30)

Results of the run.

在这里插入图片描述 A variety of different styles of images will be produced in the tmp directory.

在这里插入图片描述

Case (I) python warm-up​

Project 3: Crawling the university ranking of a province​

Case (I) python warm-up​

Project one: dictionary statistics sorting​

Project 2: Calculating the annual growth rate of mobile phone sales​

# Quality fundamentals for stock pool creation

Tushare financial data interface​

Plotting stock k-line charts​

Tushare Financial Data Interface​

# Stock Fundamental Statistics

Tushare Financial Data Interface

Installation​

Platform common interface call testing​

Registering the pro interface​

pro interface call test​

Macroeconomic data​

Money supply​

Deposit rates and loan rates​

Shibor Rate​

Stock Quote Data​

Fundamental data of listed companies​

Stock Index Data​

Origins​

Adjust the canvas size​

Control the normal display of Chinese characters and plus and minus signs​

Set the x-axis master scale display format (date)​

Set x-axis main scale spacing​

Problems identified​

Find out why​

Trying to solve​

Summary​

Case (I) python warm-up

Project 3: Crawling the university ranking of a province

Case (I) python warm-up

Project one: dictionary statistics sorting

Project 2: Calculating the annual growth rate of mobile phone sales

Tushare financial data interface

Plotting stock k-line charts

Tushare Financial Data Interface

Installation

Platform common interface call testing

Registering the pro interface

pro interface call test

Macroeconomic data

Money supply

Deposit rates and loan rates

Shibor Rate

Stock Quote Data

Fundamental data of listed companies

Stock Index Data

Origins

Adjust the canvas size

Control the normal display of Chinese characters and plus and minus signs

Set the x-axis master scale display format (date)

Set x-axis main scale spacing

Problems identified

Find out why

Trying to solve

Summary