Showing posts with label sql-server. Show all posts
Showing posts with label sql-server. Show all posts

Getting started with Python, Web Scraping, MS SQL Server, Windows with a web crawler

For getting started install python 2.7 on win7 with this *.bat script here:
http://mohiplanet.blogspot.com/2015/12/install-python-on-windows-7-scriptbat.html

Download SQL Server 2005 :
https://www.microsoft.com/en-us/download/details.aspx?id=21844
SQL Server 2005 Management Studio :
www.microsoft.com/en-us/download/details.aspx?id=8961
If you are  used to with terminal you can rather install command line client rather than visual management studio:
https://www.microsoft.com/en-us/download/details.aspx?id=36433

Make sure you have enabled Administrator mode.

After installation has completed checkout the commandline tool:

  1. sqlcmd -S .\SQLEXPRESS
  2. create some_db
  3. go
  4. use some_db
  5. go
  6. select * from some_table
  7. go
Scraping FEC(Federal Election Commission) Filings (Getting started with a simple crawler) :
Download a sample scraper which downloads all Federal Election Commission electronic filings:
  1. git clone https://github.com/cschnaars/FEC-Scraper/
  2. cd FEC-Scraper
Load FEC sql database into sql server through script:
  1. sqlcmd -S .\SQLEXPRESS
  2. create database FEC
  3. go
  4. exit
  5. sqlcmd -S .\SQLEXPRESS -i FECScraper.sql
  6. go

Setup connection string in both of  FECScraper.py and FECParser.py as follows:
  1. connstr = 'DRIVER={SQL Server};SERVER=.\SQLEXPRESS;DATABASE=FEC;UID=;PWD=;'


create the following directories for convenience of the crawler:
  1. mkdir C:\Data\
  2. mkdir C:\Data\Python
  3. mkdir C:\Data\Python\FEC
  4. mkdir C:\Data\Python\FEC\Import
  5. mkdir C:\Data\Python\FEC\Review
  6. mkdir C:\Data\Python\FEC\Processed
  7. mkdir C:\Data\Python\FEC\Output

In case you can't find any data filings:
Check out this working code:
https://drive.google.com/file/d/0B5hTtesq_tWdZFo3eThQRzY3aEU/view?usp=sharing
as last time I had to change one CSS Query from "Form F3" to"F3" in FECScraper.py

Check a sample commitee for downloading specific filings:
Add one committe id
commidappend.txt content:
  1. echo C00494393 > commidappend.txt




--------------------------------------------------------------------------------------------------------------
Doing more on scraping FEC filings :
The latest FEC scraper supports all FEC filings from v1 to v8.1  : 
it has 8.1 filing version support:
  1. git clone https://github.com/cschnaars/FEC-Scraper-Toolbox
  2. cd FEC-Scraper-Toolbox
  3. :: make sure you create following directories
  4. mkdir C:\Data\FEC\Master
  5. mkdir C:\Data\FEC\Master\Archive
  6. mkdir C:\Data\FEC\Reports\ErrorLogs
  7. mkdir C:\Data\FEC\Reports\Hold
  8. mkdir C:\Data\FEC\Reports\Output
  9. mkdir C:\Data\FEC\Reports\Processed
  10. mkdir C:\Data\FEC\Reports\Review
  11. mkdir C:\Data\FEC\Reports\Import
  12. mkdir C:\Data\FEC\Archives\Processed
  13. mkdir C:\Data\FEC\Archives\Import
  14. :: run the update_master_files.py which download all committees lists along with
  15. :: tons of other info.
  16. python update_master_files.py
  17. :: run this for downloading daily filings
  18. python download_reports.py
  19. :: run this for parsing and mapping the filing data into database
  20. python parse_reports.py
  21. :: make sure to running the db sql script first
  22. :: https://drive.google.com/file/d/0B5hTtesq_tWdYUVRSzNCcHlJYjA/view?usp=sharing
  23. :: and Import directory has *.fec files and not downloaded *.zip files
Please see this if you dont find any of this commands above not installed :
http://mohiplanet.blogspot.com/2015/10/convert-windows-command-prompt-to-linux.html

References:
https://s3.amazonaws.com/NICAR2015/FEC/MiningFECData.pdf

Browsing MS SQL Server db backup (.bak) file in linux locally without SSMS or SQLCMD or without online tools

Untitled

Imagine you only have a copy of MS SQL Server database backup (*.bak) file with no other tools for accessing it like SSMS(SQL Server Management Studio),SQLCMD,SQL-Server etc preinstalled on your machine and you would like to browse the content of the file without these tools now.

  • Because installing and setting all of them up would cost a lot of time and energy, and
  • You certainly don’t want to give out these DATA with potential bussiness value to a third party websites like rebasedata.com etc etc and
  • Even online tools have file size limitations, which should require extra dollars to work on huge files(>10MB).

Well there is a hack, a way to achieve this if you are looking to browse only db schema and no data.

Simply open the backup(*.bak) file with vim or any other plain text editor:

vim MS-SQL-db-filename.bak

At this point, you are going to see something like this:

TAPE^@^@^C^@<8c>^@^N^A^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^B^@<87>^E<85>i<8c>^E^@^@^@^A^@^@^@^A^@^C^@^@^@^@^@^@^@^@^@^@^@^@^@,^@^^@^@^D^@^R^_<8e>lµL^AM^@i^@c^@r^@o^@s^@o^@f^@t^@ ^@S^@Q^@L^@ ^@S^@e^@r^@v^@e^@r^@^@^@^@^@^@^@RAID^@^@^@^@ ^@^@^@^@^@^@^@^@^@^@^@;^E³j$^O|µÜK¯ñC<89><86>𳺭^@^@ð^A^@^A^@^A^@^@^@^@^@^A^@^@^@SPAD^@^@^@^@&^C^@^@^@^@^@^@^@^@^@^@4^W^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@

But don’t panic. Search words like “Create”, “Table”, “Procedure”, “Trigger” (incase-sensitive).

And you will find texts like:


 `CREATE PROCEDURE [dbo].[FL_GetLoanDetails]     ^M  
 -- Add the parameters for the stored procedure   
 -- here    ^M  
  @TraceNo int,    ^M  
  @LoanStatus int,    ^M  
  @MPLID nvarchar(100),    ^M  
  @StopLoan int, ^M  
  @LoanCatId int,^M  
  @LoanTypeId int^M  
      ^M  
AS    ^M  
BEGIN    ^M  
 -- SET NOCOUNT ON added to prevent extra result sets  
 -- from    ^M  
     ^M  
-- FL_GetLoanDetails 0,2,'',2, 0, 0    ^M  
     ^M  
    declare @PStartDate nvarchar(20)= null^M  
    declare @PEndDate nvarchar(20)= null^M` 

It’s normal you may find lots of ^M’s at the end of every line.
It indicates the file had been taken backup on windows machine and they denote DOS encoding new line endings.
Copy and paste the whole code snippet into a new plain text editor window and these ^M’s should be gone.

Resulting in a full readable sql code snippet.