Allen`s Blog

Financial Data Analysis XII Mortgage Consolidation Issues

August 1, 2022 · 5 Minutes to read

For Freedom

Case (5) Analysis of specific financial data

Project I.

Case details

Mr. Wang, a senior engineer in a high-tech company, recently plans to purchase a home in Guangzhou with a total price of RMB 10 million. He is unable to pay in full due to his own limited funds and intends to apply for a home mortgage loan from the local Bank C. Assuming that you are the account manager of Bank C responsible for expanding the housing mortgage loan business, after assessing Mr. Wang's repayment ability, you formulate the following loan proposal: the principal amount of the loan is RMB 6 million, the loan term is 30 years, and the interest rate of the loan is 5 basis points above the market quoted rate (LPR) for loans of 5 years or more, i.e. the loan interest rate is 4.9%. However, for this loan there are two repayment options available to Mr. Wong as follows. (1) Equal principal and interest repayment, which specifically means that the sum of the monthly repayment of principal and interest for Mr. Wong as a borrower remains the same while the interest rate level of the loan remains unchanged. (2) Equal principal repayment, which specifically means that Mr. Wang's monthly principal repayment is fixed and the interest paid decreases each month while the interest rate level of the loan remains the same. To give Mr. Wang a clear understanding of the differences between these two repayment methods and to demonstrate the loan repayments with the help of a graph, you need to complete 3 programming tasks using Python.

Programming tasks

(1) Assuming that you have chosen equal principal and interest repayment, calculate the amount that Mr. Wang needs to repay each month, as well as the principal and interest amounts of the monthly repayment, and visualise the relevant data. (2) In order to show Mr Wong the effect of a change in the interest rate on the monthly repayment amount under the equal principal rule, you use the following sensitivity analysis: that is, you model and visualise the change in Mr Wong's monthly repayment amount when the interest rate on the loan increases from 2%/year to 8%/year. (3) Assuming that the equal principal repayment rule is applied and the loan interest rate remains at 4.9%/year, calculate the principal and interest components of Mr. Wong's monthly repayments separately and visualise the results.

Start programming.

# -*- coding: utf-8 -*-
"""
Created on Tue Sept 22 8:47:37 2020

@author: mly
"""
import numpy as np
import matplotlib.pyplot as plt

# (1)
dp_rate = 0.049    # Loan Rates
loan_pv = 6000000  # Principal amount of loan Unit: $
loan_year = 30  # Loan term (years)
repay_mon = -round(np.pmt(dp_rate / 12, loan_year * 12, loan_pv))  # Monthly Repayment Amount
interestList = []  # List of interest per instalment
capitalList = []  # List of principal repayments required per instalment
monthList = [x for x in range(loan_year * 12)]  # List of repayment periods
rest = loan_pv      #Defining the remaining principal repayment in lieu
for i in range(loan_year * 12):
    interest = round(rest * (dp_rate / 12))  # Interest per instalment
    interestList.append(interest)  # Insert the interest for each period into the list of interest
    repay_capital = repay_mon - interest  # Principal amount to be repaid per instalment
    capitalList.append(repay_capital)  # Insert the principal amount to be repaid for each period into the list of principal amounts to be repaid
    rest = round(rest - repay_capital)  # Principal remaining to be repaid
# Plotting stacked bar charts
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False  #Show Chinese
plt.bar(monthList, capitalList, align="center", color="#EE9A49", label='每月本金额')
plt.bar(monthList, interestList, align="center", bottom=capitalList, color="#000000", label='每月利息额')
plt.xlabel('还款期限（月）')
plt.ylabel('每月还款额（元）')
plt.title('等额本息还款图')
plt.legend()
plt.show()

#(2)
# Sensitivity analysis of 2% to 8% p.a.
rates = [x / 100 for x in range(2, 9, 1)]  # Loan APR
repaymentNL = []
for n in range(len(rates)):
    repaymentN = -round(np.pmt(rates[n] / 12, loan_year * 12, loan_pv))  # Monthly Repayment Amount
    repaymentNL.append(repaymentN)
plt.plot(rates, repaymentNL, lw=6,color="#EE9A49", label="每月还款额")
plt.fill_between(rates, 0, repaymentNL, facecolor="#000000", alpha=1)
plt.xlabel('年利率')
plt.ylabel('每月还款额（元）')
plt.title('年利率在2%~8%变化时每月还款额变化趋势')
plt.legend()
plt.show()

# Equal principal
repay_capitalX = loan_pv / (loan_year * 12)  # Principal amount of each repayment in equal instalments
repay_capitalXL = [repay_capitalX for i in range(loan_year * 12)]
restX = loan_pv  # Initial principal to be repaid
interestXList = []  # List of interest per instalment
repaymentXL = []    # List of principal repayments required per instalment
for i in range(loan_year * 12):
    interestX = round(restX * (dp_rate / 12))  # Interest per instalment
    interestXList.append(interestX)  # Insert the interest for each period into the list of interest
    restX = round(restX - repay_capitalX)  # Principal remaining to be repaid
    repaymentX = interestX + repay_capitalX  # Repayment amount per instalment
    repaymentXL.append(repaymentX)
# Plotting stacked bar charts
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
plt.plot(monthList, repaymentXL)
plt.bar(monthList, repay_capitalXL, align="center", color="#000000", label='每期本金额')
plt.bar(monthList, interestXList, align="center", bottom=repay_capitalXL, color="#EE9A49", label='每期利息额')
plt.xlabel('还款期限（月）')
plt.ylabel('每期还款额（元）')
plt.title('等额本金还款图')
plt.legend()
plt.show()

Presentation of results

Equal Interest Repayment Chart Trend in monthly repayments as the APR varies from 2% to 8% Equal principal repayment chart

Financial Data Analysis XI Analysis of the changes in access to basic public health services by rural residents and urban residents in China over the past 25 years

August 1, 2022 · 1 Minutes to read

Allen Ma

For Freedom

Case (4) Analysis of macro-financial data

Project 2: Analysis of changes in access to basic public health services for rural versus urban residents in China over the past 25 years (chart output)

Using World Bank public data

# -*- coding: utf-8 -*-
"""
Created on Mon Sept 21 8:04:59 2020

@author: mly
"""
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np
import pandas as pd
from matplotlib import ticker

plt.rcParams['font.sans-serif'] = ['SimHei']
mpl.rcParams["axes.unicode_minus"] = False

df = pd.read_csv('basicsanit_china2000to2017.csv')

y = df['rural_sanit']
y1 = df['urban_sanit']
y2 = df['peopl_sanit']
x = df['year']

plt.figure()
ax = plt.gca()
plt.grid(axis="y")
plt.title('农村居民与城市居民享受基本公共卫生服务的变化情况')
plt.ylabel('服务数值')
plt.xlabel('年份')
ax.plot(x, y, '-rp', lw = 1.5, label = 'rural_sanit')
ax.plot(x, y1, '-gp', lw = 1.5, label = 'rural_sanit')
ax.plot(x, y2, '-bp', lw = 1.5, label = 'peopl_sanit')
ax.legend(loc = 'upper right')

plt.show()

Results of the run.

在这里插入图片描述

Financial data analysis X GDP per capita growth rate

August 1, 2022 · 2 Minutes to read

Allen Ma

For Freedom

Case (4) Macro-financial data analysis

Project 1: Comparison of GDP per capita growth rates between country A and country B over the last 40 years using macroeconomic data provided by the World Bank Open Data Platform (graphical output)

The data is available via the download link in this webpage: https://data.worldbank.org.cn/?locations=CN-US

在这里插入图片描述

# -*- coding: utf-8 -*-
"""
Created on Mon Sept 22 9:11:59 2020

@author: mly
"""
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np
import pandas as pd
from matplotlib import ticker

plt.rcParams['font.sans-serif'] = ['SimHei']
mpl.rcParams["axes.unicode_minus"] = False

df = pd.read_csv('gdpchinaseries.csv')
df2 = pd.read_csv('gdpusaseries.csv')
y = df['gdp']
y1 = df2['gdp']
x = [x for x in range(1961,2020)]

ymajorFormatter = ticker.FormatStrFormatter('%.2f%%') # Set the format of the y-axis label text

plt.figure()
ax = plt.gca()
plt.grid(axis="y")
plt.title('人均 GDP增长比较')
plt.ylabel('人均 GDP增长（年增长率）')
plt.xlabel('年份')
ax.yaxis.set_major_formatter(ymajorFormatter) # Show percentage
ax.plot(x, y, '-rp', lw = 1.5, label = 'A国')
ax.plot(x, y1, '-gp', lw = 1.5, label = 'B国')
ax.legend(loc = 'upper right')


plt.show()

Results of the run.

在这里插入图片描述

Financial Data Analysis IX Simulation of gains arising from stock trading with MACD indicator buy-sell signals

August 1, 2022 · 2 Minutes to read

Allen Ma

For Freedom

Case (3) Simple financial data analysis

Project 3: Calculate the return generated by trading stocks with the MACD indicator buy-sell signal over a one-year period

A program is designed to calculate the return generated by trading stocks with the MACD indicator buy and sell signals over a period of one year. The MACD trading signals are: a fast line crossing the slow line from bottom to top is a buy signal for that day, and a fast line crossing the slow line from top to bottom is a sell signal for that day. Assume that the buy and sell prices are the closing prices on the day of the trade signal.

# -*- coding: utf-8 -*-
"""
Created on Sun Sept 20 9:04:59 2020

@author: mly
"""
import numpy as np
import datetime
import pandas_datareader.data as web

start = datetime.datetime(2018, 6, 1)
end = datetime.datetime.today()
stock_name = '601318.ss'
df = web.DataReader(stock_name, 'yahoo', start, end)


def df_EMA(prices, N):
    ema = []
    k = len(prices)
    if k > 0:
        for i in range(k):
            if i == 0:
                ema.append(prices[i])
            else:
                ema.append((2 * prices[i] + (N - 1) * ema[i - 1]) / (N + 1))
    return (ema)


def df_MACD(df, short=12, long=26, M=9):
    fast = df_EMA(df['Adj Close'].values, short)
    slow = df_EMA(df['Adj Close'].values, long)
    if len(fast) > 0:  # & len(slow)>0:
        df['Fast'] = np.round(np.array(fast), 2)
        df['Slow'] = np.round(np.array(slow), 2)
        df['DIF'] = df['Fast'] - df['Slow']
        df['DEA'] = np.round(np.array(df_EMA(df['DIF'].values, M)), 2)
        df['MACD'] = 2 * (df['DIF'] - df['DEA'])
        df['tim'] = df['Close'] / df['Open']
        return (df)
    else:
        print('no data,no MACD')


times0 = 1
marker = 0
df_MACD(df, 12, 26, 9)
for i in df.itertuples(index=True, name='df'):
    if getattr(i, 'DIF') > 0 and marker == 0:
        times0 = times0 * getattr(i, 'tim')
        marker = 1
    elif getattr(i, 'DIF') > 0 and marker == 1:
        times0 = times0 * getattr(i, 'tim')
        marker = 1
    elif getattr(i, 'DIF') == 0 and marker == 1:
        times0 = times0 * getattr(i, 'tim')
        marker = 0
    elif getattr(i, 'DIF') < 0 and marker == 1:
        times0 = times0 * getattr(i, 'tim')
        marker = 0
    else:
        continue

print(times0)

Results of the run.

在这里插入图片描述 It seems that historically if you follow this method of buying shares, at least you won't lose qwq,23333333333...

Financial Data Analysis VIII Calculating Excess Return on Stocks

August 1, 2022 · 3 Minutes to read

Allen Ma

For Freedom

Case (3) Simple financial data analysis

Project 2: Calculating the excess return of a stock

Design a program to calculate the quarterly and annual returns of a stock and to calculate the excess return (i.e. relative return) of a single stock's return relative to the average stock market return over the same period.

This project uses tushare's Python SDK to obtain the data The details of the methodology are the subject of a new article.

# -*- coding: utf-8 -*-
"""
Created on Sat Sept 19 9:30:36 2020

@author: mly
"""
import tushare as ts
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import ticker

plt.rcParams['font.sans-serif'] = ['SimHei'] # Used to display Chinese labels properly

startday = '2015-01-01'
endday = '2020-04-01'
tscode = '600519'
tsindx = 'sh'
df1 = ts.get_k_data(tscode, start = startday, end = endday, ktype = 'M')
df2 = ts.get_k_data(tsindx, start = startday, end = endday, ktype = 'M')
df1.to_excel('600519.xlsx',index=False)
df2.to_excel('sh.xlsx',index=False)
df1=pd.read_excel('600519.xlsx',dtype={'code':'str'})
df2=pd.read_excel('sh.xlsx',dtype={'code':'str'})
df = df2[['date', 'close']].copy()
df.rename(columns={'close': 'indclose'}, inplace = True)
df = pd.merge(df[['date', 'indclose']],
              df1[['date', 'close']], on = 'date', how = 'left')
df.fillna(method = 'ffill',inplace=True) #successively to the filling
df.fillna(method = 'bfill',inplace=True) # Further forward filling
# Calculating monthly and cumulative rates of return
df['stk_log_ret'] = np.round(np.log(df['close']/df['close'].shift(1)), 4)
df['ind_log_ret'] = np.round(np.log(df['indclose']/df['indclose'].shift(1)), 4)
df['stk_log_ret'].fillna(method = 'bfill',inplace=True) # Further forward filling
df['ind_log_ret'].fillna(method = 'bfill',inplace=True) # Further forward filling
df['xd_ret']=df['stk_log_ret']-df['ind_log_ret']
df['xd_ret'].fillna(method = 'bfill',inplace=True) # Further forward filling

df_list=list(df['stk_log_ret'].values)
print(df['stk_log_ret'].values)

ret_year=[]
ret_quarter=[]
for i in range(len(df_list)//3):
    ret_quarter.append(np.round(df_list[3*i]+df_list[3*i+1]+df_list[3*i+2],4))
ret_quarter1=pd.Series(ret_quarter)
   
for n in range(len(ret_quarter)//4):
    ret_year.append(np.round(ret_quarter[4*n]+ret_quarter[4*n+1]+ret_quarter[4*n+2]+ret_quarter[4*n+3],4))
ret_year1=pd.Series(ret_year)

quarter_list=[]
year=[]
df_index=list(df.date) 
for value in df_index:
    tempvalue = value.split("-")
    if tempvalue[1] in ['01','02','03']:
        quarter_list.append(tempvalue[0] + "Q1")
        year.append(tempvalue[0])
    elif tempvalue[1] in ['04','05','06']:
        quarter_list.append(tempvalue[0] + "Q2")
        year.append(tempvalue[0])
    elif tempvalue[1] in ['07', '08', '09']:
        quarter_list.append(tempvalue[0] + "Q3")
        year.append(tempvalue[0])
    elif tempvalue[1] in ['10', '11', '12']:
        quarter_list.append(tempvalue[0] + "Q4")
        year.append(tempvalue[0])
   
quarter_set = set(quarter_list)
quarter_list = list(quarter_set)
quarter_list.sort()

year_set = set(year)
year = list(year_set)
year.sort()
year.pop() #Delete as 2020 is not yet complete

ymajorFormatter = ticker.FormatStrFormatter('%.2f%%') # Formatting of axis label text

fig = plt.figure(figsize=(14, 24))
ax1 = fig.add_subplot(3, 1, 1) 
fig.subplots_adjust(bottom = 0.2)
plt.ylabel('季度收益率')
plt.xticks(rotation = 60)
ax1.yaxis.set_major_formatter(ymajorFormatter) # Show percentage
ax1.plot(quarter_list, ret_quarter1*100, '-cs', lw = 1.5, label = tscode+' 季度收益率')
ax1.legend(loc = 'upper left')

ax2 = fig.add_subplot(3, 1, 2) 
plt.ylabel('年收益率')
plt.xticks(rotation = 60)
ax2.yaxis.set_major_formatter(ymajorFormatter) # 显示百分比
ax2.plot(year, ret_year1*100, '-gp', lw = 1.5, label = tscode+' 年收益率')
ax2.legend(loc = 'upper left')

ax3 = fig.add_subplot(3, 1, 3) 
plt.ylabel('相对收益率')
plt.xticks(rotation = 60)
ax3.yaxis.set_major_formatter(ymajorFormatter) # 显示百分比
ax3.plot(df.date, df['xd_ret']*100, '-rp', lw = 1.5, label = tscode+' 相对市场收益率')
ax3.legend(loc = 'upper left')
plt.show()

Results of the run.

在这里插入图片描述

Financial data analysis VII Trends in total repayments under different mortgage amounts

August 1, 2022 · 2 Minutes to read

Allen Ma

For Freedom

Case (3) Simple financial data analysis

Project 1: Trends in total repayments under different mortgage amounts

Design a program to compare the trends in the respective total repayments for the same loan amount (e.g. RMB 1 million) but different mortgage (monthly) amounts (calculate at least three mortgage amounts).

# -*- coding: utf-8 -*-
"""
Created on Fri Sept 18 9:50:37 2020

@author: mly
"""
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['SimHei'] # Used to display Chinese labels properly

dp_rate = 0.015 # 1-year deposit rate issued by PBoC in October 2015
rates = [0.046, 0.05, 0.054] # 2, 4 and 6 year loan rates
loan_pv = 100 # Unit: million
loan_nper = [2, 4, 6] # Unit: Year
repay_pmt = np.zeros(len(loan_nper)) # Mortgage Monthly Payment Amount
repay_fv = np.zeros(len(loan_nper)) # Actual future final value of repayments
for n in range(len(loan_nper)):
    repay_pmt[n] = round(np.pmt(rates[n]/12, loan_nper[n]*12, loan_pv)*10000, 2)
    repay_fv[n] = round(np.fv(dp_rate/12, loan_nper[n]*12, repay_pmt[n], 0), 2)

fig, ax = plt.subplots(figsize = (9,6))
ax.plot(loan_nper, np.round(repay_fv/10000, 2), marker=  'o', label = '不同还款年限的按揭终值')
ax.set(xticks = loan_nper, xlabel = '还款年限', ylabel = '100 万按揭终值')
for i in range(len(loan_nper)):
    ax.text(loan_nper[i], np.round(repay_fv/10000, 2)[i], 
             np.round(repay_fv/10000, 2)[i], ha='left', fontsize=20)
ax.legend()
plt.show()

Results of the run.

在这里插入图片描述

Financial Data Analysis VI Amazon Product Information Custom Acquisition -- REQUESTS Library

August 1, 2022 · 2 Minutes to read

Allen Ma

For Freedom

Case (II) crawler warm-up

Project two: Amazon product information custom acquisition, customizable product name and crawl page number

# -*- coding: utf-8 -*-
"""
Created on Thur Sept 17 15:56:36 2020

@author: mly
"""
import requests
import re
import pandas as pd

ilt = []
iltl = []


def getHTMLText(url):
    try:
        kv = {'user-agent': 'Mozilla/5.0',
              'Cookie': 'x-wl-uid=1+EeiKz9a/J/y3g6XfXTnSbHAItJEus3oQ6Gz+T/haur7dZfkNIgoxzMGwviB+42iWIyk9LR+iHQ=;'
                        ' session-id=457-2693740-8878563; ubid-acbcn=459-5133849-3255047; lc-acbcn=zh_CN; i18n-prefs=CNY; '
                        'session-token="8n/Oi/dUCiI9zc/0zDLjB9FQRC6sce2+Tl7F0oXncOcIYDK4SEJ7eek/Vs3UfwsRchW459OZni0AFjMW+'
                        '9xMMBPSLM8MxLNDPP1/13unryj8aiRIZAE1WAn6GaeAgauNsijuBKKUwwLh8Dba7hYEjwlI1J6xlW0LKkkyVuApjRXnOsvdYr'
                        'X8IURVpOxDBnuAF9r7O71d/NPkIQsHy7YCCw=="; session-id-time=2082787201l;'
                        ' csm-hit=tb:s-85XYJNXFEJ5NBKR0JE6H|1566558845671&t:1566558845672&adb:adblk_no'}
        r = requests.get(url, headers=kv, timeout=30)
        r.raise_for_status()
        r.encoding = r.apparent_encoding
        return r.text
    except:
        return ""


def parsePage(ilt, html):
    try:
        plt = re.findall('<span class="a-offscreen">￥(.*?)</span>', html)
        #print(plt)
        tlt = re.findall('<span class="a-size-base-plus a-color-base a-text-normal" dir="auto">(.*?)</span>', html)
        #print(tlt)
        for i in range(len(tlt)):
            ilt.append([plt[i], tlt[i]])
    except:
        return ""


def printGoodsList(ilt):
    column = ["序号", "价格", "商品名称"]
    count = 0
    for g in ilt:
        count = count + 1
        iltl.append([count, g[0], g[1]])
    test = pd.DataFrame(columns=column, data=iltl)
    test.to_csv('finance.csv', encoding='utf_8_sig', index=False)


def main():
    goods = input("请输入商品名称：")
    depth = int(input("请输入想查看到的页码："))
    start_url = 'https://www.amazon.cn/s?k=' + goods
    infoList = []
    for i in range(depth):
        try:
            url = start_url + '&page=' + str(i + 1)
            html = getHTMLText(url)
            parsePage(infoList, html)
        except:
            continue
    printGoodsList(infoList)

main()

Runs with customisable product names and crawl pages.

在这里插入图片描述 Crawl results.

在这里插入图片描述

Financial Data Analysis V Crawling Stock Data - Method 2 -- scrapy crawler framework

August 1, 2022 · 3 Minutes to read

Allen Ma

For Freedom

Case (II) crawl prep

Project two: crawling stock data using two different methods

Method two: scrapy crawler framework

This case is a crawl of relevant content using the scrapy framework.

Installing the scrapy framework

Open cmd and type the following code to install it.

pip install scrapy

Verify that the installation was successful.

scrapy -h

Create a new Scrapy crawler project

Once scrapy has been successfully installed, continue to create the project by typing the code into the cmd. Switch the directory to the path where you want to create the crawler project and execute.

scrapy startproject baidustocks

Once executed, a series of folders and files such as .py will be created in the directory. 在这里插入图片描述

Generating a Scrapy crawler in a project

Just type a single line of command in cmd, we need to specify the name of the crawler and the website to crawl.

cd baidustocks
scrapy genspider stocks hq.gucheng.com/gpdmylb.html

stocks is the name of the crawler hq.gucheng.com/gpdmylb.html is the site to crawl

A file called stocks.py will be generated when it is done.

Configure the resulting spider crawler

Modify the crawler file to suit your needs. As an example, I will crawl stock data.

# -*- coding: utf-8 -*-

import scrapy
import re
from scrapy.selector import Selector
 
 
class StocksSpider(scrapy.Spider):
    name = 'stocks'
    start_urls = ['https://hq.gucheng.com/gpdmylb.html']
 
    def parse(self, response):
        for href in response.css('a::attr(href)').extract():
            try:
                stock = re.search(r'S[HZ]\d{6}/', href)
                url = 'https://hq.gucheng.com/' + stock.group()
                yield scrapy.Request(url, callback=self.parse_stock)
            except:
                continue
 
    def parse_stock(self, response):
        infoDict = dict()
        stockInfo = response.css('.stock_top').extract()[0]
        stockprice = response.css('.s_price').extract()[0]
        stockname = response.css('.stock_title').extract()[0]
        stockname = Selector(text=stockname)
        stockprice = Selector(text=stockprice)
        stockInfo = Selector(text=stockInfo)
        infoDict['名字'] = re.search(r'>(.*?)</h1>', stockname.css('h1').extract()[0]).group(1)
        infoDict['编号'] = re.search(r'>(.*?)</h2>', stockname.css('h2').extract()[0]).group(1)
        infoDict['状态'] = re.search(r'>(.*?)</em>', stockname.css('em').extract()[0]).group(1)
        infoDict['时间'] = re.search(r'>(.*?)</time>', stockname.css('time').extract()[0]).group(1)
        price = stockprice.css('em').extract()
        infoDict['股价'] = re.search(r'>(.*?)</em>', price[0]).group(1)
        infoDict['涨跌额'] = re.search(r'>(.*?)</em>', price[1]).group(1)
        infoDict['涨跌幅'] = re.search(r'>(.*?)</em>', price[2]).group(1)
        keylist = stockInfo.css('dt').extract()
        valuelist = stockInfo.css('dd').extract()
        for i in range(len(keylist)):
            key = re.search(r'>(.*?)<', keylist[i], flags=re.S).group(1)
            key = str(key)
            key = key.replace('\n', '')
            try:
                val = re.search(r'>(.*?)<', valuelist[i], flags=re.S).group(1)
                val = str(val)
                val = val.replace('\n', '')
            except:
                val = '--'
            infoDict[key] = val
        yield infoDict

Run the crawler and get the data

cmd to execute the following command

scrapy crawl stocks

Wait for the summary information you can see after execution. 在这里插入图片描述

Write Pipelines to process the fetched data

Write pipelines.py file

# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html


# useful for handling different item types with a single interface
from itemadapter import ItemAdapter


class BaidustocksPipeline:
    def process_item(self, item, spider):
        return item

class BaidustocksInfoPipeline(object):
    def open_spider(self, spider):
        self.f = open('BaiduStockInfo.txt', 'w')
 
    def close_spider(self, spider):
        self.f.close()
 
    def process_item(self, item, spider):
        try:
            line = str(dict(item)) + '\n'
            self.f.write(line)
        except:
            pass
        return item

Configure the ITEM_PIPELINES option

Write the settings.py file Look for the parameter ITEM_PIPELINES in it and change the following parameters.

在这里插入图片描述

Execute the entire framework

In the cmd.

scrapy crawl stocks

Then we'll wait and be done with it =. =... 在这里插入图片描述 Finish the job!

Financial Data Analysis IV Crawling Stock Data - Method 1 -- requests&bs4&re

August 1, 2022 · 2 Minutes to read

Allen Ma

For Freedom

Case (II) crawl prep

Project two: crawling stock data using two different methods

Method one: requests&bs4&re

import requests
from bs4 import BeautifulSoup
import re


def getHTMLText(url, code="utf-8"):
    kv = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'}
    try:
        r = requests.get(url, headers=kv)
        r.raise_for_status()
        r.encoding = code
        return r.text
    except:
        return ""


def getStockList(lst, stockURL):
    html = getHTMLText(stockURL, "GB2312")
    soup = BeautifulSoup(html, 'html.parser')
    li = soup.find('section', attrs={'class': 'stockTable'})
    a = li.find_all('a')
    for i in a:
        try:
            href = i.attrs['href']
            lst.append(re.findall(r"[S][HZ]\d{6}", href)[0])
        except:
            continue


def getStockInfo(lst, stockURL, fpath):
    count = 0
    for stock in lst:
        url = stockURL + stock
        html = getHTMLText(url)
        try:
            if html == "":
                continue
            infoDict = {}
            soup = BeautifulSoup(html, 'html.parser')
            stockInfo = soup.find('section', attrs={'class': 'stock_price clearfix'})
            mc = soup.find('header', attrs={'class': 'stock_title'})
            name = mc.find('h1')
            infoDict.update({'股票名称': name.text})

            keyList = stockInfo.find_all('dt')
            valueList = stockInfo.find_all('dd')
            for i in range(len(keyList)):
                key = keyList[i].text
                val = valueList[i].text
                infoDict[key] = val

            with open(fpath, 'a', encoding='utf-8_sig') as f:
                f.write(str(infoDict) + '\n')
                count = count + 1
                print("\r当前进度: {:.2f}%".format(count * 100 / len(lst)), end="")
        except:
            count = count + 1
            print("\r当前进度: {:.2f}%".format(count * 100 / len(lst)), end="")
            continue


def main():
    stock_list_url = 'https://hq.gucheng.com/gpdmylb.html'
    stock_info_url = 'https://hq.gucheng.com/'
    output_file = 'BaiduStockInfo.csv'
    slist = []
    getStockList(slist, stock_list_url)
    getStockInfo(slist, stock_info_url, output_file)


main()

Projects take a while to run and progress can be viewed via the output desk.

在这里插入图片描述 More than an hour later. The execution was finished A total of 3590 data

Financial Data Analysis 3 Dangdang shop product crawler - crawler books as an example-- requests&bs4

August 1, 2022 · 2 Minutes to read

Allen Ma

For Freedom

Case (II) crawler preview

Project one: Dangdang online shop product crawler - crawler books as an example

This case is a crawl of relevant content using the bs4 library find method.

 -*- coding: utf-8 -*-
import requests
import csv
from bs4 import BeautifulSoup as bs
#Access to web information
def request_dandan(url):
    try:
        #User Agents
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'}
        r = requests.get(url,headers=headers)
        if r.status_code == 200:
            return r.text
    except requests.RequestException:
        return None

#Storage column name
def write_item_to_file():
    csv_file = open('dangdang.csv', 'w', newline='', encoding="utf-8")
    writer = csv.writer(csv_file)
    writer.writerow(['书名','购买链接','纸质书价格','电子书价格','电子书链接','书的详细介绍','书的封面地址','评论地址','作者','出版时间','出版社'])
    print('列名已成功放入CSV中')
#Parsing web pages and writing to csv files
def parse_dangdang_write(html):
    csv_file = open('dangdang.csv', 'a', newline='')
    writer = csv.writer(csv_file)
    #Parsing web pages
    soup = bs(html, 'html.parser')
    class_tags = ['line'+str(x) for x in range(1,61)]
    for class_tag in class_tags:
        li = soup.find('li',class_=class_tag)
        book_name = li.find('a',class_='pic').get('title') # Book Title
        paperbook_price = li.find('span',class_='search_now_price').text  #Paperback prices
        try:
            ebook_price = li.find('a',class_='search_e_price').find('i').text  #E-book prices
            ebook_link = li.find('a',class_='search_e_price').get('href') #Ebook links
        except:
            ebook_price = ''
            ebook_link = ''
        detail = li.find('p',class_='detail').text #Book Details
        book_purchase_link = li.find('a',class_='pic').get('href') #Detailed purchase links for each book
        book_cover_link = li.find('a',class_='pic').find('img').get('src')#Book cover address
        comment_link = li.find('a',class_='search_comment_num').get('href') #Comment Address
        author = li.find('p',class_='search_book_author').find('span').text # Author of the book
        public_time = li.find('p',class_='search_book_author').find('span').next_sibling.text[2:]#Publication date
        public = li.find('p',class_='search_book_author').find('span').next_sibling.next_sibling.text[3:]#Publisher
        writer.writerow([book_name, book_purchase_link, paperbook_price, ebook_price, ebook_link, detail, book_cover_link, comment_link, author, public_time, public])
        #writer.writerow(['book title', 'buy link', 'paperback price', 'ebook price', 'ebook link', 'book details', 'book cover address', 'review address', 'author', 'publication date', 'publisher'])
    csv_file.close()

if __name__ == '__main__':
    write_item_to_file()
    for page in range(1, 10):  # Crawl 9 pages of data into a csv file
        url = 'http://search.dangdang.com/?key=python%C5%C0%B3%E6&act=input&page_index=' + str(page)
        html = request_dandan(url)  # Access to web information
        parse_dangdang_write(html)  # Parsing web pages and writing to csv files
        print('第{}页数据成功放入CSV中'.format(page))

Results of the run. 在这里插入图片描述

Case (5) Analysis of specific financial data​

Project I.​

Case details​

Programming tasks​

Start programming.​

Presentation of results​

Case (4) Analysis of macro-financial data​

Project 2: Analysis of changes in access to basic public health services for rural versus urban residents in China over the past 25 years (chart output)​

Case (4) Macro-financial data analysis​

Project 1: Comparison of GDP per capita growth rates between country A and country B over the last 40 years using macroeconomic data provided by the World Bank Open Data Platform (graphical output)​

Case (3) Simple financial data analysis​

Project 3: Calculate the return generated by trading stocks with the MACD indicator buy-sell signal over a one-year period​

Case (3) Simple financial data analysis​

Project 2: Calculating the excess return of a stock​

Case (3) Simple financial data analysis​

Project 1: Trends in total repayments under different mortgage amounts​

Case (II) crawler warm-up​

Project two: Amazon product information custom acquisition, customizable product name and crawl page number​

Case (II) crawl prep​

Project two: crawling stock data using two different methods​

Method two: scrapy crawler framework​

Installing the scrapy framework​

Create a new Scrapy crawler project​

Generating a Scrapy crawler in a project​

Configure the resulting spider crawler​

Run the crawler and get the data​

Write Pipelines to process the fetched data​

Configure the ITEM_PIPELINES option​

Execute the entire framework​

Case (II) crawl prep​

Project two: crawling stock data using two different methods​

Method one: requests&bs4&re​

Case (II) crawler preview​

Project one: Dangdang online shop product crawler - crawler books as an example​

Case (5) Analysis of specific financial data

Project I.

Case details

Programming tasks

Start programming.

Presentation of results

Case (4) Analysis of macro-financial data

Project 2: Analysis of changes in access to basic public health services for rural versus urban residents in China over the past 25 years (chart output)

Case (4) Macro-financial data analysis

Project 1: Comparison of GDP per capita growth rates between country A and country B over the last 40 years using macroeconomic data provided by the World Bank Open Data Platform (graphical output)

Case (3) Simple financial data analysis

Project 3: Calculate the return generated by trading stocks with the MACD indicator buy-sell signal over a one-year period

Case (3) Simple financial data analysis

Project 2: Calculating the excess return of a stock

Case (3) Simple financial data analysis

Project 1: Trends in total repayments under different mortgage amounts

Case (II) crawler warm-up

Project two: Amazon product information custom acquisition, customizable product name and crawl page number

Case (II) crawl prep

Project two: crawling stock data using two different methods

Method two: scrapy crawler framework

Installing the scrapy framework

Create a new Scrapy crawler project

Generating a Scrapy crawler in a project

Configure the resulting spider crawler

Run the crawler and get the data

Write Pipelines to process the fetched data

Configure the ITEM_PIPELINES option

Execute the entire framework

Case (II) crawl prep

Project two: crawling stock data using two different methods

Method one: requests&bs4&re

Case (II) crawler preview

Project one: Dangdang online shop product crawler - crawler books as an example