#! /usr/bin/env python # wiposearchretrieve.py # # Author: Pedro Hernandez. May, 2008. v1.0 # # Retrieves the search result on the wipo (World Intellectual Property Organization) # website as a delimiter separated file. # # It has one command line parameter that is the terms of search # if there are spaces it must be entered between quotation # marks "Bla Bla" # # A sample of what we get if we execute # c> wiposearchretrieve.py "Smartcard" #Query Params|Record Id.|Patent Code|Publication Date|Description|International Class|Application Number|Applicant Name|url|Abstract #Smartcard|1|WO 2008/051999|02.05.2008|CONTACTLESS |G07F 7/00|PCT/US2007/082274|MEI, INC.|http://www.wipo.int/pctdb/en/fetch.jsp?LANG=ENG&DBSELECT=PCT&SERVER_TYPE=19-10&SORT=41228299-KEY&TYPE_FIELD=256&IDB=0&IDOC=1451224&C=10&ELEMENT_SET=BASICHTML-ENG&RESULT=1&TOTAL=1847&START=1&DISP=500&FORM=SEP-0/HITNUM,B-ENG,DP,MC,AN,PA,ABSUM-ENG&SEARCH_IA=US2007082274&QUERY=%22Smartcard%22|A multi media payment device includes a banknote acceptor and a RF card reader, and also may include a magnetic card reader. A bezel assembly for connection to the bill acceptor preferably includes a reader unit to read magnetic swipe cards and contactless chip cards. #Smartcard|2|WO 2008/051982|02.05.2008|CONTENT OWNER VERIFICATION AND DIGITAL RIGHTS MANAGEMENT FOR AUTOMATED DISTRIBUTION AND BILLING PLATFORMS|H04M 3/42|PCT/US2007/082250|SMS.AC|http://www.wipo.int/pctdb/en/fetch.jsp?LANG=ENG&DBSELECT=PCT&SERVER_TYPE=19-10&SORT=41228299-KEY&TYPE_FIELD=256&IDB=0&IDOC=1451207&C=10&ELEMENT_SET=BASICHTML-ENG&RESULT=2&TOTAL=1847&START=1&DISP=500&FORM=SEP-0/HITNUM,B-ENG,DP,MC,AN,PA,ABSUM-ENG&SEARCH_IA=US2007082250&QUERY=%22Smartcard%22|Software application providers can connect to a common platform in order to offer access to and use of their applications and/or content to a global community of mobile device users through a variety of different media. The users are automatically charged via the user's billing account with the wireless network carrier to which the user subscribes. The platform can also use billing mechanisms to bill the user other than the user's wireless network carrier, such as credit cards, bank accounts, prepaid cards, web-based payment services, etc. The application provider need not have contractual agreements with any of the wireless network carriers, as billing is automatically performed by the platform through the wireless network carriers his or ... #Smartcard|3|WO 2008/051694|02.05.2008|SYSTEM AND METHOD FOR DEVELOPING AND MANAGING GROUP SOCIAL NETWORKS|G06F 3/00|PCT/US2007/080527|INSTABUDDY LLC|http://www.wipo.int/pctdb/en/fetch.jsp?LANG=ENG&DBSELECT=PCT&SERVER_TYPE=19-10&SORT=41228299-KEY&TYPE_FIELD=256&IDB=0&IDOC=1450550&C=10&ELEMENT_SET=BASICHTML-ENG&RESULT=3&TOTAL=1847&START=1&DISP=500&FORM=SEP-0/HITNUM,B-ENG,DP,MC,AN,PA,ABSUM-ENG&SEARCH_IA=US2007080527&QUERY=%22Smartcard%22|A system and method for facilitating the configuration and management of events within a social networking system is disclosed. The system enables members of similar or different geographic region and/or like interests, hobbies, social status, relationship status, family status, etc. to interact with the system to view activities, register to participate in activities, and schedule activities. A personal workspace, accessible through a variety of devices (e.g., kiosks, web clients, wireless devices, and set-top boxes) enables network members to view a personal calendar, scheduled events and activities, invitations, localized news, and the like. The personal workspace further facilitates registration to participate in scheduled activities. A... #Smartcard|4|WO 2008/051335|02.05.2008|TRANSACTION PROCESSING METHOD|G06Q 10/00|PCT/US2007/019821|WELLS, R., Scott|http://www.wipo.int/pctdb/en/fetch.jsp?LANG=ENG&DBSELECT=PCT&SERVER_TYPE=19-10&SORT=41228299-KEY&TYPE_FIELD=256&IDB=0&IDOC=1448744&C=10&ELEMENT_SET=BASICHTML-ENG&RESULT=4&TOTAL=1847&START=1&DISP=500&FORM=SEP-0/HITNUM,B-ENG,DP,MC,AN,PA,ABSUM-ENG&SEARCH_IA=US2007019821&QUERY=%22Smartcard%22|The transaction processing method is a computer-implemented method capable of logging events related to a consumer at a point of transaction 100. The event logging is performed at a transaction processing center 120. The transaction processing center 120 can log such events as: receipts generated by a plurality of merchants doing business with the consumer; cash transactions generated at a plurality of cash transaction venues visited by the consumer; credit transactions generated by a plurality of creditors of the consumer; and non-financial events associated with the consumer. The events are reported to the transaction processing center 120 by a plurality of associate members who contract with the center 120 to provide the data. For each c... #... # Enjoy it! import httplib, mimetypes, sys, os def post(host, selector, body): """ Quite simple post that sends the information passed by parameter """ h = httplib.HTTP(host) h.putrequest('POST', selector) h.putheader('content-type', 'application/x-www-form-urlencoded') h.putheader('content-length', str(len(body))) h.endheaders() h.send(body) errcode, errmsg, headers = h.getreply() return h.file.read() # This is the delimiter that will be used in the final file. You can use # either a semicolon ";" or something like that. delimiter = "|" # Query is the value to be queried to the database. If no parameter is # supplied... if sys.argv[1:] == []: print "No data to query the database" exit(1) query = sys.argv[1] # We subst the spaces by + signs and put quotation notes in the first # and last digits (requirement of the search engine of the WIPO) for idx in range (0,len(query)): if query[idx] == " ": query = query[0:idx] + "+" + query[idx+1:len(query)] queryorig = query query = "%22" + query + "%22" # the Output filename for the html (byproduct) result filenameout = "ResultWipo" + query + ".html" # this parameter should not be changed. Is the max number of records retrieved # by post http method displaycount = "500" # These are the parameters that must be passed to the server for the query body = "LANGUAGE=ENG&SERVER_TYPE=19&DBSELECT2=SPECIFY&DBSELECT=PCT&TYPE_FIELD=256&C=10&RANKTYPE=KEY&QUERY=" + query + "&ELEMENT_SET=BASICHTML-ENG&BRIEF_ELEMENT_SET=HITNUM%2CB-ENG%2CDP%2CMC%2CAN%2CPA%2CABSUM-ENG&SEPDISPLAY=FALSE&DISPLAYCOUNT=" + displaycount # Open the output file for the html result fout=open(filenameout, 'w') # We do the first query queryresult = post("www.wipo.int","/pctdb/cgi/guest/search5",body) # And write it to the intermediate file fout.write(queryresult) rec = 1 # Now we get the number of records that meet the search criteria maxrec_f = queryresult.find("records
") maxrec_i = queryresult.find(": ",maxrec_f - 20, maxrec_f) + len(": ") maxrec = int(queryresult[maxrec_i:maxrec_f]) print str(maxrec) + " records retrieved in " + str(displaycount) + " entries files" # These is the number of records already retrieved rec = rec + int(displaycount) # Now, while the number of records retrieved is less than # the number of records meeting the criteria... while rec in range (1,maxrec): # Prepare the next query startrec = rec endrec = rec + int(displaycount) if endrec > maxrec : endrec = maxrec # If we are to retrieve the last batch of records body = "LANGUAGE=ENG&SERVER_TYPE=19&DBSELECT=PCT&TYPE_FIELD=256&C=10&RANKTYPE=KEY&QUERY=" + query + "&ORIG_QUERY=" + query +"&START=" + str(startrec) + "&END=" + str(endrec) + "&ELEMENT_SET=BASICHTML-ENG&BRIEF_ELEMENT_SET=HITNUM%2CB-ENG%2CDP%2CMC%2CAN%2CPA%2CABSUM-ENG&SEPDISPLAY=FALSE&DISPLAYCOUNT=" + displaycount queryresult = post("www.wipo.int","/pctdb/cgi/guest/irange5",body) # Write the next batch to the intermediate html file fout.write(queryresult) rec = endrec print "retrieving done. Formatting files" # Close the intermediate file fout.close() # Now we reopen the html file in read mode fin = open(filenameout, 'r') # And the final file in write mode fcsv = open(filenameout+".txt", 'w') contents = fin.read() # First we set a limit for the record recorddelimiter = ' ' # This is the string that is before each record in the html file start_rec = contents.find(recorddelimiter) end_rec = contents.find(recorddelimiter,start_rec + len(recorddelimiter)) # We write the header line lineout = "Query Params" + delimiter + "Record Id." + delimiter + "Patent Code" + delimiter + "Publication Date" + delimiter + "Description" + delimiter + "International Class" + delimiter + "Application Number" + delimiter + "Applicant Name" + delimiter + "url" + delimiter + "Abstract\n" fcsv.write(lineout) # While we find additional records with data we extract the fields values while start_rec != -1: # Now we look for the recordid within the record limits recid_start = contents.find(recorddelimiter,start_rec,end_rec) + len(recorddelimiter) recid_end = contents.find('', recid_start, end_rec) - 1 recid = contents[recid_start:recid_end] # Now we look for the url within the record limits url_start = contents.find("HREF='",start_rec,end_rec) + len("HREF='") url_end = contents.find("'",url_start,end_rec) url = contents[url_start:url_end] # Next, the patent code code_start = contents.find('(',url_end,end_rec) + len('(') code_end = contents.find(")",code_start,end_rec) code = contents[code_start:code_end] # The description desc_start = code_end + 2 desc_end = contents.find("<",desc_start,end_rec) desc = contents[desc_start:desc_end] # Date of publication date_start = contents.find('',desc_end,end_rec) + len('') + 1 date_end = date_start + 10 date = contents[date_start:date_end] # Int Class (It is a bit different because there are some records that # do not have a valid value in this field iclass_start = contents.find('',iclass_start - 1,end_rec) iclass_start = iclass_aux + 2 iclass_end = contents.find('', iclass_start, end_rec) - 1 iclass = contents[iclass_start:iclass_end] # Application Number anum_start = contents.find('',iclass_end,end_rec) + len('') + 1 anum_end = contents.find('', anum_start, end_rec) - 1 anum = contents[anum_start:anum_end] # Applicant app_start = contents.find('',anum_end,end_rec) + len('') + 1 app_end = contents.find('', app_start, end_rec) - 1 app = contents[app_start:app_end] # Abstract abs_start = contents.find('',app_end,end_rec) + len('') + 1 abs_end = contents.find('', abs_start, end_rec) - 1 abs = contents[abs_start:abs_end] # We construct the line to be written lineout = queryorig + delimiter + recid + delimiter + code + delimiter + date + delimiter + desc + delimiter + iclass+ delimiter + anum + delimiter + app + delimiter + url + delimiter + abs + "\n" # And write it fcsv.write(lineout) # We have all the information. We look for the next record start_rec = contents.find(recorddelimiter,end_rec) end_rec = contents.find(recorddelimiter,start_rec + len(recorddelimiter)) # We close both files fcsv.close fin.close # Now we import it to the database cmd = "importdata2patents.py " + filenameout+ ".txt" print "Reformatting done. Starting Import to databasse: ", cmd errorlevel = os.system(cmd)